Reward-estimation variance elimination in sequential decision processes

15 November 2018

Papers citing "Reward-estimation variance elimination in sequential decision processes"

10 / 10 papers shown

Title
Multi-Fidelity Policy Gradient Algorithms Xinjie Liu Cyrus Neary Kushagra Gupta Christian Ellis Ufuk Topcu David Fridovich-Keil OffRL 359 0 0 07 Mar 2025
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines Cathy Wu Aravind Rajeswaran Yan Duan Vikash Kumar Alexandre M. Bayen Sham Kakade Igor Mordatch Pieter Abbeel OffRL 35 151 0 20 Mar 2018
The Mirage of Action-Dependent Baselines in Reinforcement Learning George Tucker Surya Bhupatiraju S. Gu Richard Turner Zoubin Ghahramani Sergey Levine OffRL 41 127 0 27 Feb 2018
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation Will Grathwohl Dami Choi Yuhuai Wu Geoffrey Roeder David Duvenaud 75 300 0 31 Oct 2017
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic S. Gu Timothy Lillicrap Zoubin Ghahramani Richard Turner Sergey Levine OffRL BDL 58 344 0 07 Nov 2016
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates S. Gu E. Holly Timothy Lillicrap Sergey Levine OffRL SSL 82 1,474 0 03 Oct 2016
Continuous control with deep reinforcement learning Timothy Lillicrap Jonathan J. Hunt Alexander Pritzel N. Heess Tom Erez Yuval Tassa David Silver Daan Wierstra 143 13,174 0 09 Sep 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation John Schulman Philipp Moritz Sergey Levine Michael I. Jordan Pieter Abbeel OffRL 31 3,368 0 08 Jun 2015
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 221 6,722 0 19 Feb 2015
Playing Atari with Deep Reinforcement Learning Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra Martin Riedmiller 63 12,163 0 19 Dec 2013