Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon
MDP

v1v2 (latest)

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

27 January 2019

ArXiv (abs)PDF HTML

Papers citing "Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP"

11 / 11 papers shown

Title
The Bandit Whisperer: Communication Learning for Restless Bandits Yunfan Zhao Tonghan Wang Dheeraj M. Nagaraj Aparna Taneja Milind Tambe 112 6 0 11 Aug 2024
Learning to Steer Markovian Agents under Model Uncertainty Jiawei Huang Vinzenz Thoma Zebang Shen H. Nax Niao He 95 2 0 14 Jul 2024
Settling the Sample Complexity of Online Reinforcement Learning Zihan Zhang Yuxin Chen Jason D. Lee S. Du OffRL 194 25 0 25 Jul 2023
Is Q-learning Provably Efficient? Chi Jin Zeyuan Allen-Zhu Sébastien Bubeck Michael I. Jordan OffRL 78 812 0 10 Jul 2018
Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes Aaron Sidford Mengdi Wang X. Wu Yinyu Ye 65 128 0 27 Oct 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning Christoph Dann Tor Lattimore Emma Brunskill 83 311 0 22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning M. G. Azar Ian Osband Rémi Munos 95 778 0 16 Mar 2017
Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih Adria Puigdomenech Badia M. Berk Mirza Alex Graves Timothy Lillicrap Tim Harley David Silver Koray Kavukcuoglu 207 8,881 0 04 Feb 2016
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 279 6,801 0 19 Feb 2015
Playing Atari with Deep Reinforcement Learning Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra Martin Riedmiller 129 12,269 0 19 Dec 2013
PAC Bounds for Discounted MDPs Tor Lattimore Marcus Hutter 104 190 0 17 Feb 2012

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.