Examining average and discounted reward optimality criteria in
reinforcement learning

v1v2 (latest)

Examining average and discounted reward optimality criteria in reinforcement learning

3 July 2021

ArXiv (abs)PDF HTML

Papers citing "Examining average and discounted reward optimality criteria in reinforcement learning"

9 / 9 papers shown

Title
Towards Tight Bounds on the Sample Complexity of Average-reward MDPs Yujia Jin Aaron Sidford 35 31 0 13 Jun 2021
Average-reward model-free reinforcement learning: a systematic review and literature mapping Vektor Dewanto George Dunn A. Eshragh M. Gallagher Fred Roosta 64 30 0 18 Oct 2020
Zeroth-order Deterministic Policy Gradient Harshat Kumar Dionysios S. Kalogerias George J. Pappas Alejandro Ribeiro OffRL 27 14 0 12 Jun 2020
Is the Policy Gradient a Gradient? Chris Nota Philip S. Thomas 76 58 0 17 Jun 2019
Deep Reinforcement Learning that Matters Peter Henderson Riashat Islam Philip Bachman Joelle Pineau Doina Precup David Meger OffRL 118 1,954 0 19 Sep 2017
Unifying task specification in reinforcement learning Martha White OffRL 52 90 0 07 Sep 2016
OpenAI Gym Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang Wojciech Zaremba OffRL ODL 223 5,077 0 05 Jun 2016
Infinite-Horizon Policy-Gradient Estimation Jonathan Baxter Peter L. Bartlett 100 811 0 03 Jun 2011
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view B. Scherrer 82 102 0 19 Nov 2010