Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.09056
Cited By
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs
18 May 2022
Ian A. Kash
L. Reyzin
Zishun Yu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs"
15 / 15 papers shown
Title
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
Washim Uddin Mondal
Vaneet Aggarwal
57
2
0
04 May 2023
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
86
21
0
31 Jan 2022
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning
Gen Li
Laixi Shi
Yuxin Chen
Yuejie Chi
OffRL
49
51
0
09 Oct 2021
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Yue Wu
Dongruo Zhou
Quanquan Gu
27
21
0
15 Feb 2021
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes
Dongruo Zhou
Quanquan Gu
Csaba Szepesvári
49
205
0
15 Dec 2020
Average-reward model-free reinforcement learning: a systematic review and literature mapping
Vektor Dewanto
George Dunn
A. Eshragh
M. Gallagher
Fred Roosta
24
29
0
18 Oct 2020
On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts
Jun Liu
19
15
0
21 Jul 2020
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
Dongruo Zhou
Jiafan He
Quanquan Gu
35
134
0
23 Jun 2020
Q
Q
Q
-learning with Logarithmic Regret
Kunhe Yang
Lin F. Yang
S. Du
48
59
0
16 Jun 2020
Model-Based Reinforcement Learning with Value-Targeted Regression
Alex Ayoub
Zeyu Jia
Csaba Szepesvári
Mengdi Wang
Lin F. Yang
OffRL
70
301
0
01 Jun 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
33
103
0
03 Dec 2019
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
Ronan Fruit
Matteo Pirotta
A. Lazaric
18
61
0
06 Jul 2018
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
45
307
0
22 Mar 2017
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Ian Osband
Benjamin Van Roy
BDL
74
257
0
01 Jul 2016
Bounded regret in stochastic multi-armed bandits
Sébastien Bubeck
Vianney Perchet
Philippe Rigollet
123
92
0
06 Feb 2013
1