Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.00832
Cited By
Square-root regret bounds for continuous-time episodic Markov decision processes
3 October 2022
Xuefeng Gao
X. Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Square-root regret bounds for continuous-time episodic Markov decision processes"
17 / 17 papers shown
Title
Statistical Learning with Sublinear Regret of Propagator Models
Eyal Neuman
Yufei Zhang
62
7
0
12 Jan 2023
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
78
75
0
02 Jul 2022
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Xuefeng Gao
X. Zhou
67
8
0
23 May 2022
Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
41
23
0
19 Dec 2021
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Yanwei Jia
X. Zhou
OffRL
114
83
0
22 Nov 2021
Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
Yanwei Jia
X. Zhou
OffRL
51
65
0
15 Aug 2021
Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls
Xin Guo
Anran Hu
Yufei Zhang
51
24
0
19 Apr 2021
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
O. D. Domingues
Pierre Ménard
E. Kaufmann
Michal Valko
52
97
0
07 Oct 2020
Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon
Matteo Basei
Xin Guo
Anran Hu
Yufei Zhang
26
41
0
27 Jun 2020
Making Deep Q-learning methods robust to time discretization
Corentin Tallec
Léonard Blier
Yann Ollivier
OOD
OffRL
31
91
0
28 Jan 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
97
276
0
01 Jan 2019
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
110
144
0
07 Nov 2018
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
63
806
0
10 Jul 2018
Exploration--Exploitation in MDPs with Options
Ronan Fruit
A. Lazaric
41
41
0
25 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
83
774
0
16 Mar 2017
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Ian Osband
Benjamin Van Roy
BDL
76
260
0
01 Jul 2016
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
Aurélien Garivier
Pierre Ménard
Gilles Stoltz
49
213
0
23 Feb 2016
1