Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

21 February 2018

Papers citing "Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation"

2 / 2 papers shown

Title
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic S. Gu Timothy Lillicrap Zoubin Ghahramani Richard Turner Sergey Levine OffRL BDL 88 345 0 07 Nov 2016
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning R. Sutton A. R. Mahmood Martha White 91 272 0 14 Mar 2015