Adaptive Approximate Policy Iteration

8 February 2020

Papers citing "Adaptive Approximate Policy Iteration"

22 / 22 papers shown

Title
Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic View Muhammad Naeem Miroslav Pajic 30 1 0 15 Jun 2020
Provably Efficient Exploration in Policy Optimization Qi Cai Zhuoran Yang Chi Jin Zhaoran Wang 39 278 0 12 Dec 2019
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes Chen-Yu Wei Mehdi Jafarnia-Jahromi Haipeng Luo Hiteshi Sharma R. Jain 121 104 0 15 Oct 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation Chi Jin Zhuoran Yang Zhaoran Wang Michael I. Jordan 76 549 0 11 Jul 2019
Worst-Case Regret Bounds for Exploration via Randomized Value Functions Daniel Russo OffRL 26 82 0 07 Jun 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound Lin F. Yang Mengdi Wang OffRL GP 50 284 0 24 May 2019
A Theory of Regularized Markov Decision Processes Matthieu Geist B. Scherrer Olivier Pietquin 84 317 0 31 Jan 2019
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP Kefan Dong Yuanhao Wang Xiaoyu Chen Liwei Wang OffRL 36 95 0 27 Jan 2019
Is Q-learning Provably Efficient? Chi Jin Zeyuan Allen-Zhu Sébastien Bubeck Michael I. Jordan OffRL 52 801 0 10 Jul 2018
Maximum a Posteriori Policy Optimisation A. Abdolmaleki Jost Tobias Springenberg Yuval Tassa Rémi Munos N. Heess Martin Riedmiller 64 471 0 14 Jun 2018
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs M. S. Talebi Odalric-Ambrym Maillard 47 72 0 05 Mar 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning Ronan Fruit Matteo Pirotta A. Lazaric R. Ortner 54 115 0 12 Feb 2018
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach Ouyang Yi Mukul Gagrani A. Nayyar R. Jain 27 126 0 14 Sep 2017
A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria Joulani András Gyorgy Csaba Szepesvári 20 42 0 08 Sep 2017
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 234 18,685 0 20 Jul 2017
Deep Exploration via Randomized Value Functions Ian Osband Benjamin Van Roy Daniel Russo Zheng Wen 71 302 0 22 Mar 2017
Deep Reinforcement Learning with Double Q-learning H. V. Hasselt A. Guez David Silver OffRL 131 7,590 0 22 Sep 2015
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 239 6,722 0 19 Feb 2015
Generalization and Exploration via Randomized Value Functions Ian Osband Benjamin Van Roy Zheng Wen 67 314 0 04 Feb 2014
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin Karthik Sridharan 54 377 0 08 Nov 2013
Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan 112 355 0 18 Aug 2012
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs Peter L. Bartlett Ambuj Tewari 71 280 0 09 May 2012