v1v2 (latest)

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

23 July 2020

Chen-Yu Wei

Mehdi Jafarnia-Jahromi

Haipeng Luo

Rahul Jain

ArXiv (abs)PDF HTML

Papers citing "Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"

26 / 26 papers shown

Title
Improved Analysis of UCRL2 with Empirical Bernstein Inequality Ronan Fruit Matteo Pirotta A. Lazaric 34 33 0 10 Jul 2020
Online learning in MDPs with linear function approximation and bandit feedback Gergely Neu Julia Olkhovskaya 36 32 0 03 Jul 2020
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension Ruosong Wang Ruslan Salakhutdinov Lin F. Yang 62 55 0 21 May 2020
Learning Near Optimal Policies with Low Inherent Bellman Error Andrea Zanette A. Lazaric Mykel Kochenderfer Emma Brunskill OffRL 71 222 0 29 Feb 2020
Adaptive Approximate Policy Iteration Botao Hao N. Lazić Yasin Abbasi-Yadkori Pooria Joulani Csaba Szepesvári 61 14 0 08 Feb 2020
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes Chen-Yu Wei Mehdi Jafarnia-Jahromi Haipeng Luo Hiteshi Sharma R. Jain 132 106 0 15 Oct 2019
$$\sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank$ $\sqrt{n}$ -Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank Kefan Dong Jian-wei Peng Yining Wang Yuanshuo Zhou OffRL 53 36 0 05 Sep 2019
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence Lingxiao Wang Qi Cai Zhuoran Yang Zhaoran Wang 85 241 0 29 Aug 2019
Exploration-Enhanced POLITEX Yasin Abbasi-Yadkori N. Lazić Csaba Szepesvári Gellert Weisz 52 23 0 27 Aug 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift Alekh Agarwal Sham Kakade Jason D. Lee G. Mahajan 69 321 0 01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation Chi Jin Zhuoran Yang Zhaoran Wang Michael I. Jordan 96 557 0 11 Jul 2019
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy Boyi Liu Qi Cai Zhuoran Yang Zhaoran Wang 73 111 0 25 Jun 2019
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function Zihan Zhang Xiangyang Ji 60 72 0 12 Jun 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound Lin F. Yang Mengdi Wang OffRL GP 62 286 0 24 May 2019
Regret Bounds for Reinforcement Learning via Markov Chain Concentration R. Ortner 67 46 0 06 Aug 2018
Scalable Bilinear $π$ Learning Using State and Action Features Yichen Chen Lihong Li Mengdi Wang 64 46 0 27 Apr 2018
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs M. S. Talebi Odalric-Ambrym Maillard 56 72 0 05 Mar 2018
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning Ronan Fruit Matteo Pirotta A. Lazaric R. Ortner 86 116 0 12 Feb 2018
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 517 19,065 0 20 Jul 2017
A unified view of entropy-regularized Markov decision processes Gergely Neu Anders Jonsson Vicencc Gómez 97 263 0 22 May 2017
Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih Adria Puigdomenech Badia M. Berk Mirza Alex Graves Timothy Lillicrap Tim Harley David Silver Koray Kavukcuoglu 199 8,859 0 04 Feb 2016
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 277 6,776 0 19 Feb 2015
Generalization and Exploration via Randomized Value Functions Ian Osband Benjamin Van Roy Zheng Wen 79 314 0 04 Feb 2014
Volumetric Spanners: an Efficient Exploration Basis for Learning Elad Hazan Zohar Karnin Raghu Mehka 255 97 0 21 Dec 2013
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs Peter L. Bartlett Ambuj Tewari 91 284 0 09 May 2012
Towards minimax policies for online linear optimization with bandit feedback Sébastien Bubeck Nicolò Cesa-Bianchi Sham Kakade OffRL 283 150 0 14 Feb 2012