ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.07073
  4. Cited By
Is the Policy Gradient a Gradient?

Is the Policy Gradient a Gradient?

17 June 2019
Chris Nota
Philip S. Thomas
ArXivPDFHTML

Papers citing "Is the Policy Gradient a Gradient?"

13 / 13 papers shown
Title
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Bo Liu
Ji Liu
Mohammad Ghavamzadeh
Sridhar Mahadevan
Marek Petrik
59
158
0
06 Jun 2020
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
169
5,178
0
26 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
292
8,329
0
04 Jan 2018
Scalable trust-region method for deep reinforcement learning using
  Kronecker-factored approximation
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Yuhuai Wu
Elman Mansimov
Shun Liao
Roger C. Grosse
Jimmy Ba
OffRL
52
626
0
17 Aug 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
468
19,006
0
20 Jul 2017
Sample Efficient Actor-Critic with Experience Replay
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
97
761
0
03 Nov 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
191
8,850
0
04 Feb 2016
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
318
13,234
0
09 Sep 2015
Emphatic Temporal-Difference Learning
Emphatic Temporal-Difference Learning
A. R. Mahmood
Huizhen Yu
Martha White
R. Sutton
151
33
0
06 Jul 2015
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
84
3,406
0
08 Jun 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,764
0
19 Feb 2015
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
Lex Weaver
Nigel Tao
117
247
0
10 Jan 2013
The Arcade Learning Environment: An Evaluation Platform for General
  Agents
The Arcade Learning Environment: An Evaluation Platform for General Agents
Marc G. Bellemare
Yavar Naddaf
J. Veness
Michael Bowling
109
3,004
0
19 Jul 2012
1