Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.09311
Cited By
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
27 January 2019
Kefan Dong
Yuanhao Wang
Xiaoyu Chen
Liwei Wang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP"
33 / 33 papers shown
Title
Automatic Reward Shaping from Confounded Offline Data
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
OffRL
OnRL
33
0
0
16 May 2025
The Bandit Whisperer: Communication Learning for Restless Bandits
Yunfan Zhao
Tonghan Wang
Dheeraj M. Nagaraj
Aparna Taneja
Milind Tambe
54
5
0
11 Aug 2024
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
48
2
0
14 Jul 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
34
3
0
08 Feb 2024
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
98
22
0
25 Jul 2023
Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time
Xiang Ji
Gen Li
OffRL
32
7
0
24 May 2023
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
Yang Cai
Haipeng Luo
Chen-Yu Wei
Weiqiang Zheng
31
18
0
05 Mar 2023
Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor
Julien Grand-Clément
Marko Petrik
35
14
0
31 Jan 2023
Provable Reset-free Reinforcement Learning by No-Regret Reduction
Hoai-An Nguyen
Ching-An Cheng
OffRL
31
2
0
06 Jan 2023
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Yinglun Xu
Qi Zeng
Gagandeep Singh
AAML
40
6
0
30 May 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Zihan Zhang
Xiangyang Ji
S. Du
30
21
0
24 Mar 2022
The Efficacy of Pessimism in Asynchronous Q-Learning
Yuling Yan
Gen Li
Yuxin Chen
Jianqing Fan
OffRL
78
40
0
14 Mar 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Liyu Chen
R. Jain
Haipeng Luo
57
25
0
31 Jan 2022
Recent Advances in Reinforcement Learning in Finance
B. Hambly
Renyuan Xu
Huining Yang
OffRL
29
167
0
08 Dec 2021
Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
Simone Parisi
Victoria Dean
Deepak Pathak
Abhinav Gupta
LM&Ro
40
50
0
25 Nov 2021
Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning
Yuanzhi Li
Ruosong Wang
Lin F. Yang
27
20
0
01 Nov 2021
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning
Gen Li
Laixi Shi
Yuxin Chen
Yuejie Chi
OffRL
47
51
0
09 Oct 2021
A Survey of Exploration Methods in Reinforcement Learning
Susan Amin
Maziar Gomrokchi
Harsh Satija
H. V. Hoof
Doina Precup
OffRL
34
80
0
01 Sep 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
38
44
0
18 Jul 2021
A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes
Honghao Wei
Xin Liu
Lei Ying
21
21
0
03 Jun 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Jean Tarbouriech
Runlong Zhou
S. Du
Matteo Pirotta
M. Valko
A. Lazaric
59
35
0
22 Apr 2021
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
Gen Li
Changxiao Cai
Ee
Yuting Wei
Yuejie Chi
OffRL
50
75
0
12 Feb 2021
Control with adaptive Q-learning
J. Araújo
Mário A. T. Figueiredo
M. Botto
33
2
0
03 Nov 2020
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
21
37
0
01 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
17
104
0
28 Sep 2020
Single-partition adaptive Q-learning
J. Araújo
Mário A. T. Figueiredo
M. Botto
OffRL
20
2
0
14 Jul 2020
A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Jean Tarbouriech
Matteo Pirotta
Michal Valko
A. Lazaric
OffRL
25
16
0
13 Jul 2020
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
Dongruo Zhou
Jiafan He
Quanquan Gu
30
133
0
23 Jun 2020
Q
Q
Q
-learning with Logarithmic Regret
Kunhe Yang
Lin F. Yang
S. Du
43
59
0
16 Jun 2020
Multi-Agent Reinforcement Learning in Stochastic Networked Systems
Yiheng Lin
Guannan Qu
Longbo Huang
Adam Wierman
34
38
0
11 Jun 2020
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
Yudong Chen
Zhaoran Wang
Zhuoran Yang
39
124
0
17 Feb 2020
Adaptive Approximate Policy Iteration
Botao Hao
N. Lazić
Yasin Abbasi-Yadkori
Pooria Joulani
Csaba Szepesvári
18
14
0
08 Feb 2020
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
107
100
0
15 Oct 2019
1