ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.08238
  4. Cited By
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

15 October 2022
Zihan Zhang
Yuhang Jiang
Yuanshuo Zhou
Xiangyang Ji
    OffRL
ArXivPDFHTML

Papers citing "Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning"

19 / 19 papers shown
Title
Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost
Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost
Dan Qiao
Ming Yin
Ming Min
Yu Wang
65
28
0
13 Feb 2022
A Provably Efficient Algorithm for Linear Markov Decision Process with
  Low Switching Cost
A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost
Minbo Gao
Tianle Xie
S. Du
Lin F. Yang
60
46
0
02 Jan 2021
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal
  Algorithm Escaping the Curse of Horizon
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
95
105
0
28 Sep 2020
Multinomial Logit Bandit with Low Switching Cost
Multinomial Logit Bandit with Low Switching Cost
Kefan Dong
Yingkai Li
Qin Zhang
Yuanshuo Zhou
41
15
0
09 Jul 2020
Linear Bandits with Limited Adaptivity and Learning Distributional
  Optimal Design
Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design
Yufei Ruan
Jiaqi Yang
Yuanshuo Zhou
OffRL
149
52
0
04 Jul 2020
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
  with a Generative Model
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
Gen Li
Yuting Wei
Yuejie Chi
Yuxin Chen
94
128
0
26 May 2020
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage
  Decomposition
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition
Zihan Zhang
Yuanshuo Zhou
Xiangyang Ji
OffRL
62
156
0
21 Apr 2020
Convergent Policy Optimization for Safe Reinforcement Learning
Convergent Policy Optimization for Safe Reinforcement Learning
Ming Yu
Zhuoran Yang
Mladen Kolar
Zhaoran Wang
63
95
0
26 Oct 2019
Regret Minimization for Reinforcement Learning by Evaluating the Optimal
  Bias Function
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang
Xiangyang Ji
50
71
0
12 Jun 2019
Provably Efficient Q-Learning with Low Switching Cost
Provably Efficient Q-Learning with Low Switching Cost
Yu Bai
Tengyang Xie
Nan Jiang
Yu Wang
63
93
0
30 May 2019
Batched Multi-armed Bandits Problem
Batched Multi-armed Bandits Problem
Zijun Gao
Yanjun Han
Zhimei Ren
Zhengqing Zhou
148
141
0
03 Apr 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning
  without Domain Knowledge using Value Function Bounds
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
97
276
0
01 Jan 2019
Policy Certificates: Towards Accountable Reinforcement Learning
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
110
144
0
07 Nov 2018
Is Q-learning Provably Efficient?
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
63
806
0
10 Jul 2018
Device Placement Optimization with Reinforcement Learning
Device Placement Optimization with Reinforcement Learning
Azalia Mirhoseini
Hieu H. Pham
Quoc V. Le
Benoit Steiner
Rasmus Larsen
Yuefeng Zhou
Naveen Kumar
Mohammad Norouzi
Samy Bengio
J. Dean
82
440
0
13 Jun 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
83
774
0
16 Mar 2017
Batched bandit problems
Batched bandit problems
Vianney Perchet
Philippe Rigollet
Sylvain Chassang
E. Snowberg
OffRL
176
202
0
02 May 2015
Online Learning with Switching Costs and Other Adaptive Adversaries
Online Learning with Switching Costs and Other Adaptive Adversaries
Nicolò Cesa-Bianchi
O. Dekel
Ohad Shamir
88
120
0
18 Feb 2013
REGAL: A Regularization based Algorithm for Reinforcement Learning in
  Weakly Communicating MDPs
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
Peter L. Bartlett
Ambuj Tewari
89
283
0
09 May 2012
1