Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.03765
Cited By
Is Q-learning Provably Efficient?
10 July 2018
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is Q-learning Provably Efficient?"
50 / 222 papers shown
Title
Automatic Reward Shaping from Confounded Offline Data
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
OffRL
OnRL
39
0
0
16 May 2025
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
16
0
0
16 May 2025
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
94
1
0
29 Apr 2025
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Moritz A. Zanger
Pascal R. van der Vaart
Wendelin Bohmer
M. Spaan
UQCV
BDL
251
0
0
14 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
70
24
0
20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
69
2
0
13 Feb 2025
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang
Yu-Ting Lee
Hui-Ying Shih
Pei-Yuan Wu
Pei-Yuan Wu
OffRL
LRM
243
0
0
31 Oct 2024
Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
Bo Yue
Jian Li
Guiliang Liu
39
2
0
24 Sep 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
84
1
0
22 Aug 2024
Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Taehyun Cho
Seung Han
Kyungjae Lee
Seokhun Ju
Dohyeong Kim
Jungwoo Lee
72
0
0
31 Jul 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
60
4
0
18 Jul 2024
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
52
2
0
14 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
1
0
08 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Miao Lu
Han Zhong
Tong Zhang
Jose H. Blanchet
OffRL
OOD
79
6
0
04 Apr 2024
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang
Jason D. Lee
Yuxin Chen
Simon S. Du
35
3
0
15 Mar 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
47
3
0
28 Feb 2024
Enhancing Reinforcement Learning Agents with Local Guides
Paul Daoudi
Bogdan Robu
Christophe Prieur
Ludovic Dos Santos
M. Barlier
OnRL
31
3
0
21 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
39
3
0
08 Feb 2024
Cascading Reinforcement Learning
Yihan Du
R. Srikant
Wei Chen
19
0
0
17 Jan 2024
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Paul Daoudi
Mathias Formoso
Othman Gaizi
Achraf Azize
Evrard Garcelon
OffRL
31
0
0
24 Dec 2023
Multi-Agent Probabilistic Ensembles with Trajectory Sampling for Connected Autonomous Vehicles
Ruoqi Wen
Jiahao Huang
Rongpeng Li
Guoru Ding
Zhifeng Zhao
42
1
0
21 Dec 2023
A Concentration Bound for TD(0) with Function Approximation
Siddharth Chandak
Vivek Borkar
42
0
0
16 Dec 2023
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
Cassidy Laidlaw
Banghua Zhu
Stuart J. Russell
Anca Dragan
41
2
0
13 Dec 2023
A safe exploration approach to constrained Markov decision processes
Tingting Ni
Maryam Kamgarpour
54
3
0
01 Dec 2023
Policy Optimization via Adv2: Adversarial Learning on Advantage Functions
Matthieu Jonckheere
Chiara Mignacco
Gilles Stoltz
33
2
0
25 Oct 2023
When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
37
5
0
09 Oct 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
34
9
0
05 Sep 2023
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
98
22
0
25 Jul 2023
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Tongxin Li
Yiheng Lin
Shaolei Ren
Adam Wierman
AAML
OffRL
44
6
0
20 Jul 2023
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Ruiqi Zhang
Andrea Zanette
OffRL
OnRL
45
7
0
10 Jul 2023
Diverse Projection Ensembles for Distributional Reinforcement Learning
Moritz A. Zanger
Wendelin Bohmer
M. Spaan
38
4
0
12 Jun 2023
Provably Efficient Adversarial Imitation Learning with Unknown Transitions
Tian Xu
Ziniu Li
Yang Yu
Zhimin Luo
18
8
0
11 Jun 2023
High-probability sample complexities for policy evaluation with linear function approximation
Gen Li
Weichen Wu
Yuejie Chi
Cong Ma
Alessandro Rinaldo
Yuting Wei
OffRL
40
7
0
30 May 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDL
OffRL
33
20
0
29 May 2023
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria
Fivos Kalogiannis
Ioannis Panageas
42
8
0
23 May 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
40
26
0
15 May 2023
Bayesian Reinforcement Learning with Limited Cognitive Load
Dilip Arumugam
Mark K. Ho
Noah D. Goodman
Benjamin Van Roy
OffRL
39
8
0
05 May 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes
Réda Alami
Mohammed Mahfoud
Eric Moulines
24
2
0
01 Apr 2023
Fast Rates for Maximum Entropy Exploration
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
49
18
0
14 Mar 2023
Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control
Amarildo Likmeta
Matteo Sacco
Alberto Maria Metelli
Marcello Restelli
OffRL
28
3
0
04 Mar 2023
Provably Efficient Reinforcement Learning via Surprise Bound
Hanlin Zhu
Ruosong Wang
Jason D. Lee
OffRL
30
5
0
22 Feb 2023
Reinforcement Learning with Function Approximation: From Linear to Nonlinear
Jihao Long
Jiequn Han
39
5
0
20 Feb 2023
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Fang-yuan Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
36
12
0
14 Feb 2023
Robust Knowledge Transfer in Tiered Reinforcement Learning
Jiawei Huang
Niao He
OffRL
34
1
0
10 Feb 2023
Online Reinforcement Learning with Uncertain Episode Lengths
Debmalya Mandal
Goran Radanović
Jiarui Gan
Adish Singla
R. Majumdar
OffRL
38
5
0
07 Feb 2023
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
Yunchang Yang
Hangshi Zhong
Tianhao Wu
B. Liu
Liwei Wang
S. Du
OffRL
32
8
0
03 Feb 2023
User-centric Heterogeneous-action Deep Reinforcement Learning for Virtual Reality in the Metaverse over Wireless Networks
Wen-li Yu
Terence Jie Chua
Junfeng Zhao
EgoV
45
18
0
03 Feb 2023
Sample Complexity of Kernel-Based Q-Learning
Sing-Yuan Yeh
Fu-Chieh Chang
Chang-Wei Yueh
Pei-Yuan Wu
A. Bernacchia
Sattar Vakili
OffRL
35
4
0
01 Feb 2023
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Jonathan Lee
Alekh Agarwal
Christoph Dann
Tong Zhang
39
20
0
31 Jan 2023
1
2
3
4
5
Next