Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.04020
Cited By
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
12 February 2018
Ronan Fruit
Matteo Pirotta
A. Lazaric
R. Ortner
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning"
39 / 39 papers shown
Title
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
78
2
0
10 Oct 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
63
4
0
18 Jul 2024
Dealing with unbounded gradients in stochastic saddle-point optimization
Gergely Neu
Nneka Okolo
52
4
0
21 Feb 2024
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Bhargav Ganguly
Yang Xu
Vaneet Aggarwal
34
0
0
18 Oct 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
37
11
0
05 Sep 2023
Learning Optimal Admission Control in Partially Observable Queueing Networks
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
39
1
0
04 Aug 2023
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
98
22
0
25 Jul 2023
Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
51
2
0
21 Feb 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou
Zihan Zhang
S. Du
49
10
0
31 Jan 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
55
12
0
30 Jan 2023
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Xuefeng Gao
X. Zhou
46
8
0
23 May 2022
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses
D. Tiapkin
Denis Belomestny
Eric Moulines
A. Naumov
S. Samsonov
Yunhao Tang
Michal Valko
Pierre Menard
60
19
0
16 May 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Liyu Chen
R. Jain
Haipeng Luo
72
25
0
31 Jan 2022
Bad-Policy Density: A Measure of Reinforcement Learning Hardness
David Abel
Cameron Allen
Dilip Arumugam
D Ellis Hershkowitz
Michael L. Littman
Lawson L. S. Wong
31
2
0
07 Oct 2021
Understanding Domain Randomization for Sim-to-real Transfer
Xiaoyu Chen
Jiachen Hu
Chi Jin
Lihong Li
Liwei Wang
31
115
0
07 Oct 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
38
12
0
12 Sep 2021
A Survey of Exploration Methods in Reinforcement Learning
Susan Amin
Maziar Gomrokchi
Harsh Satija
H. V. Hoof
Doina Precup
OffRL
43
81
0
01 Sep 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
55
44
0
18 Jul 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Jean Tarbouriech
Runlong Zhou
S. Du
Matteo Pirotta
M. Valko
A. Lazaric
76
35
0
22 Apr 2021
UCB Momentum Q-learning: Correcting the bias without forgetting
Pierre Menard
O. D. Domingues
Xuedong Shang
Michal Valko
79
41
0
01 Mar 2021
Online Learning for Unknown Partially Observable MDPs
Mehdi Jafarnia-Jahromi
Rahul Jain
A. Nayyar
48
20
0
25 Feb 2021
Tactical Optimism and Pessimism for Deep Reinforcement Learning
Theodore H. Moskovitz
Jack Parker-Holder
Aldo Pacchiano
Michael Arbel
Michael I. Jordan
32
55
0
07 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
43
32
0
29 Dec 2020
A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Jean Tarbouriech
Matteo Pirotta
Michal Valko
A. Lazaric
OffRL
35
16
0
13 Jul 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
36
93
0
24 Jun 2020
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
Mehdi Jafarnia-Jahromi
Chen-Yu Wei
Rahul Jain
Haipeng Luo
33
7
0
08 Jun 2020
Tightening Exploration in Upper Confidence Reinforcement Learning
Hippolyte Bourel
Odalric-Ambrym Maillard
M. S. Talebi
30
31
0
20 Apr 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Shuang Qiu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
48
48
0
02 Mar 2020
Learning Near Optimal Policies with Low Inherent Bellman Error
Andrea Zanette
A. Lazaric
Mykel Kochenderfer
Emma Brunskill
OffRL
29
222
0
29 Feb 2020
Adaptive Approximate Policy Iteration
Botao Hao
N. Lazić
Yasin Abbasi-Yadkori
Pooria Joulani
Csaba Szepesvári
18
14
0
08 Feb 2020
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
109
104
0
15 Oct 2019
Model-Based Reinforcement Learning Exploiting State-Action Equivalence
Mahsa Asadi
M. S. Talebi
Hippolyte Bourel
Odalric-Ambrym Maillard
OffRL
24
9
0
09 Oct 2019
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang
Xiangyang Ji
21
71
0
12 Jun 2019
Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
31
7
0
07 Jun 2019
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
Yonathan Efroni
Nadav Merlis
Mohammad Ghavamzadeh
Shie Mannor
OffRL
29
68
0
27 May 2019
Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards
Wang Chi Cheung
16
17
0
15 May 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
56
273
0
01 Jan 2019
Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
Jian Qian
Ronan Fruit
Matteo Pirotta
A. Lazaric
14
10
0
11 Dec 2018
Regret Bounds for Reinforcement Learning via Markov Chain Concentration
R. Ortner
38
46
0
06 Aug 2018
1