Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.05704
Cited By
v1
v2 (latest)
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
8 July 2024
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization"
27 / 27 papers shown
Title
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
78
1
0
03 Jul 2024
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Haolin Liu
Chen-Yu Wei
Julian Zimmert
54
6
0
17 Oct 2023
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman
Alon Cohen
Tomer Koren
Yishay Mansour
72
7
0
28 Aug 2023
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
194
25
0
25 Jul 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
73
29
0
15 May 2023
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
97
45
0
18 Jul 2021
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin
Longbo Huang
Haipeng Luo
60
42
0
08 Jun 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
91
53
0
24 Mar 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
122
24
0
17 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
62
35
0
29 Dec 2020
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
O. D. Domingues
Pierre Ménard
E. Kaufmann
Michal Valko
57
98
0
07 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
110
107
0
28 Sep 2020
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
59
90
0
19 Feb 2020
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
68
283
0
12 Dec 2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
75
104
0
03 Dec 2019
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Daniel Russo
OffRL
48
88
0
07 Jun 2019
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv A. Rosenberg
Yishay Mansour
54
138
0
19 May 2019
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
78
812
0
10 Jul 2018
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints
Aurélien Garivier
Hédi Hadiji
Pierre Menard
Gilles Stoltz
55
33
0
14 May 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
544
19,296
0
20 Jul 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
83
311
0
22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
95
778
0
16 Mar 2017
Scale-Free Algorithms for Online Linear Optimization
Francesco Orabona
D. Pál
ODL
67
53
0
19 Feb 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
279
6,801
0
19 Feb 2015
A Second-order Bound with Excess Losses
Pierre Gaillard
Gilles Stoltz
T. Erven
76
154
0
10 Feb 2014
Follow the Leader If You Can, Hedge If You Must
S. D. Rooij
T. Erven
Peter Grünwald
Wouter M. Koolen
205
181
0
03 Jan 2013
Adaptive Hedge
T. Erven
Peter Grünwald
Wouter M. Koolen
S. D. Rooij
93
50
0
28 Oct 2011
1