Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.07073
Cited By
Is the Policy Gradient a Gradient?
17 June 2019
Chris Nota
Philip S. Thomas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is the Policy Gradient a Gradient?"
13 / 13 papers shown
Title
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Bo Liu
Ji Liu
Mohammad Ghavamzadeh
Sridhar Mahadevan
Marek Petrik
59
158
0
06 Jun 2020
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
169
5,178
0
26 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
292
8,329
0
04 Jan 2018
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Yuhuai Wu
Elman Mansimov
Shun Liao
Roger C. Grosse
Jimmy Ba
OffRL
52
626
0
17 Aug 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
468
19,006
0
20 Jul 2017
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
97
761
0
03 Nov 2016
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
191
8,850
0
04 Feb 2016
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
318
13,234
0
09 Sep 2015
Emphatic Temporal-Difference Learning
A. R. Mahmood
Huizhen Yu
Martha White
R. Sutton
151
33
0
06 Jul 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
84
3,406
0
08 Jun 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,764
0
19 Feb 2015
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
Lex Weaver
Nigel Tao
117
247
0
10 Jan 2013
The Arcade Learning Environment: An Evaluation Platform for General Agents
Marc G. Bellemare
Yavar Naddaf
J. Veness
Michael Bowling
109
3,004
0
19 Jul 2012
1