Posterior sampling for reinforcement learning: worst-case regret bounds

v1v2v3 (latest)

Posterior sampling for reinforcement learning: worst-case regret bounds

19 May 2017

ArXiv (abs)PDF HTML

Papers citing "Posterior sampling for reinforcement learning: worst-case regret bounds"

6 / 6 papers shown

Title
Why is Posterior Sampling Better than Optimism for Reinforcement Learning? Ian Osband Benjamin Van Roy BDL 83 261 0 01 Jul 2016
Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm Yasin Abbasi-Yadkori Csaba Szepesvári 84 19 0 16 Jun 2014
Generalization and Exploration via Randomized Value Functions Ian Osband Benjamin Van Roy Zheng Wen 79 314 0 04 Feb 2014
Thompson Sampling for Contextual Bandits with Linear Payoffs Shipra Agrawal Navin Goyal 195 1,004 0 15 Sep 2012
A Bayesian Sampling Approach to Exploration in Reinforcement Learning J. Asmuth Lihong Li Michael L. Littman A. Nouri David Wingate BDL 87 189 0 09 May 2012
PAC-Bayesian Inequalities for Martingales Yevgeny Seldin François Laviolette Nicolò Cesa-Bianchi John Shawe-Taylor P. Auer 126 127 0 31 Oct 2011