v1v2 (latest)

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

8 July 2024

Papers citing "Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization"

27 / 27 papers shown

Title
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes Asaf B. Cassel Aviv A. Rosenberg 78 1 0 03 Jul 2024
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback Haolin Liu Chen-Yu Wei Julian Zimmert 54 6 0 17 Oct 2023
Rate-Optimal Policy Optimization for Linear Markov Decision Processes Uri Sherman Alon Cohen Tomer Koren Yishay Mansour 72 7 0 28 Aug 2023
Settling the Sample Complexity of Online Reinforcement Learning Zihan Zhang Yuxin Chen Jason D. Lee S. Du OffRL 194 25 0 25 Jul 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes Han Zhong Tong Zhang 73 29 0 15 May 2023
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses Haipeng Luo Chen-Yu Wei Chung-Wei Lee 97 45 0 18 Jul 2021
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition Tiancheng Jin Longbo Huang Haipeng Luo 60 42 0 08 Jun 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation Andrea Zanette Ching-An Cheng Alekh Agarwal 91 53 0 24 Mar 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs Jiafan He Dongruo Zhou Quanquan Gu 122 24 0 17 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback Tal Lancewicki Aviv A. Rosenberg Yishay Mansour 62 35 0 29 Dec 2020
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited O. D. Domingues Pierre Ménard E. Kaufmann Michal Valko 57 98 0 07 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon Zihan Zhang Xiangyang Ji S. Du OffRL 110 107 0 28 Sep 2020
Optimistic Policy Optimization with Bandit Feedback Yonathan Efroni Lior Shani Aviv A. Rosenberg Shie Mannor 59 90 0 19 Feb 2020
Provably Efficient Exploration in Policy Optimization Qi Cai Zhuoran Yang Chi Jin Zhaoran Wang 68 283 0 12 Dec 2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition Chi Jin Tiancheng Jin Haipeng Luo S. Sra Tiancheng Yu 75 104 0 03 Dec 2019
Worst-Case Regret Bounds for Exploration via Randomized Value Functions Daniel Russo OffRL 48 88 0 07 Jun 2019
Online Convex Optimization in Adversarial Markov Decision Processes Aviv A. Rosenberg Yishay Mansour 54 138 0 19 May 2019
Is Q-learning Provably Efficient? Chi Jin Zeyuan Allen-Zhu Sébastien Bubeck Michael I. Jordan OffRL 78 812 0 10 Jul 2018
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints Aurélien Garivier Hédi Hadiji Pierre Menard Gilles Stoltz 55 33 0 14 May 2018
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 544 19,296 0 20 Jul 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning Christoph Dann Tor Lattimore Emma Brunskill 83 311 0 22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning M. G. Azar Ian Osband Rémi Munos 95 778 0 16 Mar 2017
Scale-Free Algorithms for Online Linear Optimization Francesco Orabona D. Pál ODL 67 53 0 19 Feb 2015
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 279 6,801 0 19 Feb 2015
A Second-order Bound with Excess Losses Pierre Gaillard Gilles Stoltz T. Erven 76 154 0 10 Feb 2014
Follow the Leader If You Can, Hedge If You Must S. D. Rooij T. Erven Peter Grünwald Wouter M. Koolen 205 181 0 03 Jan 2013
Adaptive Hedge T. Erven Peter Grünwald Wouter M. Koolen S. D. Rooij 93 50 0 28 Oct 2011