Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

18 July 2021

Papers citing "Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses"

33 / 33 papers shown

Title
Online Episodic Convex Reinforcement Learning B. Moreno Khaled Eldowa Pierre Gaillard Margaux Brégère Nadia Oudjane OffRL 184 0 0 12 May 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach Haolin Liu Chen-Yu Wei Julian Zimmert 230 0 0 09 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning Chen-Yu Wei Christoph Dann Julian Zimmert 156 45 0 31 Dec 2024
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits Masahiro Kato Shinji Ito 139 0 0 05 Mar 2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes Chen Ye Wei Xiong Quanquan Gu Tong Zhang 166 31 0 12 Dec 2022
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation Andrea Zanette Ching-An Cheng Alekh Agarwal 91 53 0 24 Mar 2021
Improved Regret Bound and Experience Replay in Regularized Policy Iteration N. Lazić Dong Yin Yasin Abbasi-Yadkori Csaba Szepesvári OffRL 44 18 0 25 Feb 2021
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case Liyu Chen Haipeng Luo 72 31 0 10 Feb 2021
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition Liyu Chen Haipeng Luo Chen-Yu Wei 73 32 0 07 Dec 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation Chen-Yu Wei Mehdi Jafarnia-Jahromi Haipeng Luo Rahul Jain 70 43 0 23 Jul 2020
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning Alekh Agarwal Mikael Henaff Sham Kakade Wen Sun OffRL 70 110 0 16 Jul 2020
Online learning in MDPs with linear function approximation and bandit feedback Gergely Neu Julia Olkhovskaya 49 32 0 03 Jul 2020
On Reward-Free Reinforcement Learning with Linear Function Approximation Ruosong Wang S. Du Lin F. Yang Ruslan Salakhutdinov OffRL 73 107 0 19 Jun 2020
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs Chung-Wei Lee Haipeng Luo Chen-Yu Wei Mengxiao Zhang 175 53 0 14 Jun 2020
Optimistic Policy Optimization with Bandit Feedback Yonathan Efroni Lior Shani Aviv A. Rosenberg Shie Mannor 56 90 0 19 Feb 2020
Provably Efficient Exploration in Policy Optimization Qi Cai Zhuoran Yang Chi Jin Zhaoran Wang 66 283 0 12 Dec 2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition Chi Jin Tiancheng Jin Haipeng Luo S. Sra Tiancheng Yu 75 104 0 03 Dec 2019
Frequentist Regret Bounds for Randomized Least-Squares Value Iteration Andrea Zanette David Brandfonbrener Emma Brunskill Matteo Pirotta A. Lazaric 78 132 0 01 Nov 2019
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes Chen-Yu Wei Mehdi Jafarnia-Jahromi Haipeng Luo Hiteshi Sharma R. Jain 136 108 0 15 Oct 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift Alekh Agarwal Sham Kakade Jason D. Lee G. Mahajan 69 321 0 01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation Chi Jin Zhuoran Yang Zhaoran Wang Michael I. Jordan 98 560 0 11 Jul 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound Lin F. Yang Mengdi Wang OffRL GP 66 288 0 24 May 2019
Online Convex Optimization in Adversarial Markov Decision Processes Aviv A. Rosenberg Yishay Mansour 54 138 0 19 May 2019
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP Kefan Dong Yuanhao Wang Xiaoyu Chen Liwei Wang OffRL 65 96 0 27 Jan 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds Andrea Zanette Emma Brunskill OffRL 115 276 0 01 Jan 2019
Is Q-learning Provably Efficient? Chi Jin Zeyuan Allen-Zhu Sébastien Bubeck Michael I. Jordan OffRL 78 812 0 10 Jul 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning Ronan Fruit Matteo Pirotta A. Lazaric R. Ortner 89 117 0 12 Feb 2018
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 535 19,265 0 20 Jul 2017
Minimax Regret Bounds for Reinforcement Learning M. G. Azar Ian Osband Rémi Munos 92 778 0 16 Mar 2017
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning Christoph Dann Emma Brunskill 74 249 0 29 Oct 2015
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu 402 185 0 10 Jun 2015
On the Sample Complexity of Reinforcement Learning with a Generative Model M. G. Azar Rémi Munos H. Kappen 76 156 0 27 Jun 2012
Contextual Bandit Algorithms with Supervised Learning Guarantees A. Beygelzimer John Langford Lihong Li L. Reyzin Robert Schapire OffRL 199 326 0 22 Feb 2010