Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

11 February 2013

Papers citing "Online Regret Bounds for Undiscounted Continuous Reinforcement Learning"

6 / 6 papers shown

Title
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms Khashayar Khosravi R. Leme Chara Podimata Apostolis Tsorvantzis 50 0 0 21 Jul 2023
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning Odalric-Ambrym Maillard P. Nguyen R. Ortner D. Ryabko 62 30 0 11 Feb 2013
Selecting the State-Representation in Reinforcement Learning Odalric-Ambrym Maillard Rémi Munos D. Ryabko 54 40 0 11 Feb 2013
Regret Bounds for Restless Markov Bandits R. Ortner D. Ryabko P. Auer Rémi Munos 63 117 0 12 Sep 2012
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs Peter L. Bartlett Ambuj Tewari 71 280 0 09 May 2012
Multi-Armed Bandits in Metric Spaces Robert D. Kleinberg Aleksandrs Slivkins E. Upfal 215 468 0 29 Sep 2008