ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.00832
  4. Cited By
Square-root regret bounds for continuous-time episodic Markov decision
  processes

Square-root regret bounds for continuous-time episodic Markov decision processes

3 October 2022
Xuefeng Gao
X. Zhou
ArXivPDFHTML

Papers citing "Square-root regret bounds for continuous-time episodic Markov decision processes"

17 / 17 papers shown
Title
Statistical Learning with Sublinear Regret of Propagator Models
Statistical Learning with Sublinear Regret of Propagator Models
Eyal Neuman
Yufei Zhang
62
7
0
12 Jan 2023
q-Learning in Continuous Time
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
78
75
0
02 Jul 2022
Logarithmic regret bounds for continuous-time average-reward Markov
  decision processes
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Xuefeng Gao
X. Zhou
67
8
0
23 May 2022
Exploration-exploitation trade-off for continuous-time episodic
  reinforcement learning with linear-convex models
Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
41
23
0
19 Dec 2021
Policy Gradient and Actor-Critic Learning in Continuous Time and Space:
  Theory and Algorithms
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Yanwei Jia
X. Zhou
OffRL
114
83
0
22 Nov 2021
Policy Evaluation and Temporal-Difference Learning in Continuous Time
  and Space: A Martingale Approach
Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
Yanwei Jia
X. Zhou
OffRL
51
65
0
15 Aug 2021
Reinforcement learning for linear-convex models with jumps via stability
  analysis of feedback controls
Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls
Xin Guo
Anran Hu
Yufei Zhang
51
24
0
19 Apr 2021
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds
  Revisited
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
O. D. Domingues
Pierre Ménard
E. Kaufmann
Michal Valko
52
97
0
07 Oct 2020
Logarithmic regret for episodic continuous-time linear-quadratic
  reinforcement learning over a finite-time horizon
Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon
Matteo Basei
Xin Guo
Anran Hu
Yufei Zhang
26
41
0
27 Jun 2020
Making Deep Q-learning methods robust to time discretization
Making Deep Q-learning methods robust to time discretization
Corentin Tallec
Léonard Blier
Yann Ollivier
OOD
OffRL
31
91
0
28 Jan 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning
  without Domain Knowledge using Value Function Bounds
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
97
276
0
01 Jan 2019
Policy Certificates: Towards Accountable Reinforcement Learning
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
110
144
0
07 Nov 2018
Is Q-learning Provably Efficient?
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
63
806
0
10 Jul 2018
Exploration--Exploitation in MDPs with Options
Exploration--Exploitation in MDPs with Options
Ronan Fruit
A. Lazaric
41
41
0
25 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
83
774
0
16 Mar 2017
Why is Posterior Sampling Better than Optimism for Reinforcement
  Learning?
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Ian Osband
Benjamin Van Roy
BDL
76
260
0
01 Jul 2016
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
Aurélien Garivier
Pierre Ménard
Gilles Stoltz
49
213
0
23 Feb 2016
1