Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.01312
Cited By
UCB Momentum Q-learning: Correcting the bias without forgetting
1 March 2021
Pierre Menard
O. D. Domingues
Xuedong Shang
Michal Valko
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UCB Momentum Q-learning: Correcting the bias without forgetting"
12 / 12 papers shown
Title
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
111
2
0
10 Oct 2024
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
Xutong Liu
Siwei Wang
Jinhang Zuo
Han Zhong
Xuchuang Wang
Zhiyong Wang
Shuai Li
Mohammad Hajiesmaili
J. C. Lui
Wei Chen
178
4
0
03 Jun 2024
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
157
23
0
25 Jul 2023
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
89
105
0
28 Sep 2020
Momentum Q-learning with Finite-Sample Convergence Guarantee
Bowen Weng
Huaqing Xiong
Linna Zhao
Yingbin Liang
Wei Zhang
43
8
0
30 Jul 2020
Fast active learning for pure exploration in reinforcement learning
Pierre Ménard
O. D. Domingues
Anders Jonsson
E. Kaufmann
Edouard Leurent
Michal Valko
40
95
0
27 Jul 2020
Adaptive Discretization for Model-Based Reinforcement Learning
Sean R. Sinclair
Tianyu Wang
Gauri Jain
Siddhartha Banerjee
Chao Yu
OffRL
36
21
0
01 Jul 2020
Adaptive Reward-Free Exploration
E. Kaufmann
Pierre Ménard
O. D. Domingues
Anders Jonsson
Edouard Leurent
Michal Valko
50
81
0
11 Jun 2020
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
Yonathan Efroni
Nadav Merlis
Mohammad Ghavamzadeh
Shie Mannor
OffRL
80
68
0
27 May 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
95
276
0
01 Jan 2019
Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
Aaron Sidford
Mengdi Wang
X. Wu
Yinyu Ye
52
125
0
27 Oct 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
72
308
0
22 Mar 2017
1