Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

10 June 2020

Papers citing "Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition"

21 / 21 papers shown

Title
A Model Selection Approach for Corruption Robust Reinforcement Learning Chen-Yu Wei Christoph Dann Julian Zimmert 99 44 0 31 Dec 2024
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs Yu Chen Jiatai Huang Yan Dai Longbo Huang 101 0 0 04 Oct 2024
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits Masahiro Kato Shinji Ito 83 0 0 05 Mar 2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes Chen Ye Wei Xiong Quanquan Gu Tong Zhang 89 31 0 12 Dec 2022
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback Chung-Wei Lee Haipeng Luo Mengxiao Zhang 29 24 0 02 Feb 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition Chi Jin Tiancheng Jin Haipeng Luo S. Sra Tiancheng Yu 43 103 0 03 Dec 2019
Corruption-robust exploration in episodic reinforcement learning Thodoris Lykouris Max Simchowitz Aleksandrs Slivkins Wen Sun 41 105 0 20 Nov 2019
Introduction to Online Convex Optimization Elad Hazan OffRL 90 1,922 0 07 Sep 2019
Equipping Experts/Bandits with Long-term Memory Kai Zheng Haipeng Luo Ilias Diakonikolas Liwei Wang OffRL 36 15 0 30 May 2019
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs Max Simchowitz Kevin Jamieson 50 144 0 09 May 2019
Better Algorithms for Stochastic Bandits with Adversarial Corruptions Anupam Gupta Tomer Koren Kunal Talwar AAML 56 152 0 22 Feb 2019
Bandit Principal Component Analysis W. Kotłowski Gergely Neu 29 17 0 08 Feb 2019
Improved Path-length Regret Bounds for Bandits Sébastien Bubeck Yuanzhi Li Haipeng Luo Chen-Yu Wei 54 46 0 29 Jan 2019
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously Julian Zimmert Haipeng Luo Chen-Yu Wei 59 81 0 25 Jan 2019
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits Julian Zimmert Yevgeny Seldin AAML 46 177 0 19 Jul 2018
Stochastic bandits robust to adversarial corruptions Thodoris Lykouris Vahab Mirrokni R. Leme AAML 73 203 0 25 Mar 2018
More Adaptive Algorithms for Adversarial Bandits Chen-Yu Wei Haipeng Luo 79 181 0 10 Jan 2018
Sparsity, variance and curvature in multi-armed bandits Sébastien Bubeck Michael B. Cohen Yuanzhi Li 69 60 0 03 Nov 2017
Minimax Regret Bounds for Reinforcement Learning M. G. Azar Ian Osband Rémi Munos 65 771 0 16 Mar 2017
An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits Yevgeny Seldin Gábor Lugosi 33 92 0 20 Feb 2017
An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits P. Auer Chao-Kai Chiang 22 110 0 27 May 2016