Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs

18 May 2022

Papers citing "Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs"

15 / 15 papers shown

Title
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward Washim Uddin Mondal Vaneet Aggarwal 57 2 0 04 May 2023
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback Tiancheng Jin Tal Lancewicki Haipeng Luo Yishay Mansour Aviv A. Rosenberg 86 21 0 31 Jan 2022
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning Gen Li Laixi Shi Yuxin Chen Yuejie Chi OffRL 49 51 0 09 Oct 2021
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation Yue Wu Dongruo Zhou Quanquan Gu 27 21 0 15 Feb 2021
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes Dongruo Zhou Quanquan Gu Csaba Szepesvári 49 205 0 15 Dec 2020
Average-reward model-free reinforcement learning: a systematic review and literature mapping Vektor Dewanto George Dunn A. Eshragh M. Gallagher Fred Roosta 24 29 0 18 Oct 2020
On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts Jun Liu 19 15 0 21 Jul 2020
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping Dongruo Zhou Jiafan He Quanquan Gu 35 134 0 23 Jun 2020
$Q$ -learning with Logarithmic Regret Kunhe Yang Lin F. Yang S. Du 48 59 0 16 Jun 2020
Model-Based Reinforcement Learning with Value-Targeted Regression Alex Ayoub Zeyu Jia Csaba Szepesvári Mengdi Wang Lin F. Yang OffRL 70 301 0 01 Jun 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition Chi Jin Tiancheng Jin Haipeng Luo S. Sra Tiancheng Yu 33 103 0 03 Dec 2019
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes Ronan Fruit Matteo Pirotta A. Lazaric 18 61 0 06 Jul 2018
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning Christoph Dann Tor Lattimore Emma Brunskill 45 307 0 22 Mar 2017
Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband Benjamin Van Roy BDL 74 257 0 01 Jul 2016
Bounded regret in stochastic multi-armed bandits Sébastien Bubeck Vianney Perchet Philippe Rigollet 123 92 0 06 Feb 2013