ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.03765
  4. Cited By
Is Q-learning Provably Efficient?

Is Q-learning Provably Efficient?

10 July 2018
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
    OffRL
ArXivPDFHTML

Papers citing "Is Q-learning Provably Efficient?"

50 / 222 papers shown
Title
Automatic Reward Shaping from Confounded Offline Data
Automatic Reward Shaping from Confounded Offline Data
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
OffRL
OnRL
39
0
0
16 May 2025
Exploration by Random Distribution Distillation
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
16
0
0
16 May 2025
Toward Efficient Exploration by Large Language Model Agents
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
94
1
0
29 Apr 2025
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Moritz A. Zanger
Pascal R. van der Vaart
Wendelin Bohmer
M. Spaan
UQCV
BDL
251
0
0
14 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
70
24
0
20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
69
2
0
13 Feb 2025
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang
Yu-Ting Lee
Hui-Ying Shih
Pei-Yuan Wu
Pei-Yuan Wu
OffRL
LRM
243
0
0
31 Oct 2024
Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
Bo Yue
Jian Li
Guiliang Liu
39
2
0
24 Sep 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
84
1
0
22 Aug 2024
Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Taehyun Cho
Seung Han
Kyungjae Lee
Seokhun Ju
Dohyeong Kim
Jungwoo Lee
72
0
0
31 Jul 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
60
4
0
18 Jul 2024
Learning to Steer Markovian Agents under Model Uncertainty
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
52
2
0
14 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
1
0
08 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Distributionally Robust Reinforcement Learning with Interactive Data
  Collection: Fundamental Hardness and Near-Optimal Algorithm
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Miao Lu
Han Zhong
Tong Zhang
Jose H. Blanchet
OffRL
OOD
79
6
0
04 Apr 2024
Horizon-Free Regret for Linear Markov Decision Processes
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang
Jason D. Lee
Yuxin Chen
Simon S. Du
35
3
0
15 Mar 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with
  General Function Approximation
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
47
3
0
28 Feb 2024
Enhancing Reinforcement Learning Agents with Local Guides
Enhancing Reinforcement Learning Agents with Local Guides
Paul Daoudi
Bogdan Robu
Christophe Prieur
Ludovic Dos Santos
M. Barlier
OnRL
31
3
0
21 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy
  Coverage Suffices
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
39
3
0
08 Feb 2024
Cascading Reinforcement Learning
Cascading Reinforcement Learning
Yihan Du
R. Srikant
Wei Chen
19
0
0
17 Jan 2024
Conservative Exploration for Policy Optimization via Off-Policy Policy
  Evaluation
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Paul Daoudi
Mathias Formoso
Othman Gaizi
Achraf Azize
Evrard Garcelon
OffRL
31
0
0
24 Dec 2023
Multi-Agent Probabilistic Ensembles with Trajectory Sampling for
  Connected Autonomous Vehicles
Multi-Agent Probabilistic Ensembles with Trajectory Sampling for Connected Autonomous Vehicles
Ruoqi Wen
Jiahao Huang
Rongpeng Li
Guoru Ding
Zhifeng Zhao
42
1
0
21 Dec 2023
A Concentration Bound for TD(0) with Function Approximation
A Concentration Bound for TD(0) with Function Approximation
Siddharth Chandak
Vivek Borkar
42
0
0
16 Dec 2023
The Effective Horizon Explains Deep RL Performance in Stochastic
  Environments
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
Cassidy Laidlaw
Banghua Zhu
Stuart J. Russell
Anca Dragan
41
2
0
13 Dec 2023
A safe exploration approach to constrained Markov decision processes
A safe exploration approach to constrained Markov decision processes
Tingting Ni
Maryam Kamgarpour
54
3
0
01 Dec 2023
Policy Optimization via Adv2: Adversarial Learning on Advantage Functions
Policy Optimization via Adv2: Adversarial Learning on Advantage Functions
Matthieu Jonckheere
Chiara Mignacco
Gilles Stoltz
33
2
0
25 Oct 2023
When is Agnostic Reinforcement Learning Statistically Tractable?
When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
37
5
0
09 Oct 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon
  Average Reward Markov Decision Processes
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
34
9
0
05 Sep 2023
Settling the Sample Complexity of Online Reinforcement Learning
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
98
22
0
25 Jul 2023
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with
  Q-Value Predictions
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Tongxin Li
Yiheng Lin
Shaolei Ren
Adam Wierman
AAML
OffRL
44
6
0
20 Jul 2023
Policy Finetuning in Reinforcement Learning via Design of Experiments
  using Offline Data
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Ruiqi Zhang
Andrea Zanette
OffRL
OnRL
45
7
0
10 Jul 2023
Diverse Projection Ensembles for Distributional Reinforcement Learning
Diverse Projection Ensembles for Distributional Reinforcement Learning
Moritz A. Zanger
Wendelin Bohmer
M. Spaan
38
4
0
12 Jun 2023
Provably Efficient Adversarial Imitation Learning with Unknown
  Transitions
Provably Efficient Adversarial Imitation Learning with Unknown Transitions
Tian Xu
Ziniu Li
Yang Yu
Zhimin Luo
18
8
0
11 Jun 2023
High-probability sample complexities for policy evaluation with linear
  function approximation
High-probability sample complexities for policy evaluation with linear function approximation
Gen Li
Weichen Wu
Yuejie Chi
Cong Ma
Alessandro Rinaldo
Yuting Wei
OffRL
40
7
0
30 May 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning
  via Langevin Monte Carlo
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDL
OffRL
33
20
0
29 May 2023
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient
  Computation of Nash Equilibria
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria
Fivos Kalogiannis
Ioannis Panageas
42
8
0
23 May 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in
  Linear Markov Decision Processes
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
40
26
0
15 May 2023
Bayesian Reinforcement Learning with Limited Cognitive Load
Bayesian Reinforcement Learning with Limited Cognitive Load
Dilip Arumugam
Mark K. Ho
Noah D. Goodman
Benjamin Van Roy
OffRL
39
8
0
05 May 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary
  Markov Decision Processes
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes
Réda Alami
Mohammed Mahfoud
Eric Moulines
24
2
0
01 Apr 2023
Fast Rates for Maximum Entropy Exploration
Fast Rates for Maximum Entropy Exploration
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
49
18
0
14 Mar 2023
Wasserstein Actor-Critic: Directed Exploration via Optimism for
  Continuous-Actions Control
Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control
Amarildo Likmeta
Matteo Sacco
Alberto Maria Metelli
Marcello Restelli
OffRL
28
3
0
04 Mar 2023
Provably Efficient Reinforcement Learning via Surprise Bound
Provably Efficient Reinforcement Learning via Surprise Bound
Hanlin Zhu
Ruosong Wang
Jason D. Lee
OffRL
30
5
0
22 Feb 2023
Reinforcement Learning with Function Approximation: From Linear to
  Nonlinear
Reinforcement Learning with Function Approximation: From Linear to Nonlinear
Jihao Long
Jiequn Han
39
5
0
20 Feb 2023
Improved Regret Bounds for Linear Adversarial MDPs via Linear
  Optimization
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Fang-yuan Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
36
12
0
14 Feb 2023
Robust Knowledge Transfer in Tiered Reinforcement Learning
Robust Knowledge Transfer in Tiered Reinforcement Learning
Jiawei Huang
Niao He
OffRL
34
1
0
10 Feb 2023
Online Reinforcement Learning with Uncertain Episode Lengths
Online Reinforcement Learning with Uncertain Episode Lengths
Debmalya Mandal
Goran Radanović
Jiarui Gan
Adish Singla
R. Majumdar
OffRL
38
5
0
07 Feb 2023
A Reduction-based Framework for Sequential Decision Making with Delayed
  Feedback
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
Yunchang Yang
Hangshi Zhong
Tianhao Wu
B. Liu
Liwei Wang
S. Du
OffRL
32
8
0
03 Feb 2023
User-centric Heterogeneous-action Deep Reinforcement Learning for
  Virtual Reality in the Metaverse over Wireless Networks
User-centric Heterogeneous-action Deep Reinforcement Learning for Virtual Reality in the Metaverse over Wireless Networks
Wen-li Yu
Terence Jie Chua
Junfeng Zhao
EgoV
45
18
0
03 Feb 2023
Sample Complexity of Kernel-Based Q-Learning
Sample Complexity of Kernel-Based Q-Learning
Sing-Yuan Yeh
Fu-Chieh Chang
Chang-Wei Yueh
Pei-Yuan Wu
A. Bernacchia
Sattar Vakili
OffRL
35
4
0
01 Feb 2023
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Jonathan Lee
Alekh Agarwal
Christoph Dann
Tong Zhang
39
20
0
31 Jan 2023
12345
Next