Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.01891
Cited By
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
6 July 2017
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trust-PCL: An Off-Policy Trust Region Method for Continuous Control"
21 / 21 papers shown
Title
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
478
19,019
0
20 Jul 2017
Emergence of Locomotion Behaviours in Rich Environments
N. Heess
TB Dhruva
S. Sriram
Jay Lemmon
J. Merel
...
Tom Erez
Ziyun Wang
S. M. Ali Eslami
Martin Riedmiller
David Silver
202
937
0
07 Jul 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Bernhard Schölkopf
Sergey Levine
OffRL
69
165
0
01 Jun 2017
Discrete Sequential Prediction of Continuous Actions for Deep RL
Luke Metz
Julian Ibarz
Navdeep Jaitly
James Davidson
BDL
OffRL
74
119
0
14 May 2017
Equivalence Between Policy Gradients and Soft Q-Learning
John Schulman
Xi Chen
Pieter Abbeel
OffRL
83
346
0
21 Apr 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
156
472
0
28 Feb 2017
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
100
1,340
0
27 Feb 2017
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
OffRL
BDL
88
345
0
07 Nov 2016
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
97
761
0
03 Nov 2016
Reward Augmented Maximum Likelihood for Neural Structured Prediction
Mohammad Norouzi
Samy Bengio
Zhiwen Chen
Navdeep Jaitly
M. Schuster
Yonghui Wu
Dale Schuurmans
77
253
0
01 Sep 2016
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNN
AI4CE
431
18,350
0
27 May 2016
Benchmarking Deep Reinforcement Learning for Continuous Control
Yan Duan
Xi Chen
Rein Houthooft
John Schulman
Pieter Abbeel
OffRL
79
1,693
0
22 Apr 2016
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
194
8,851
0
04 Feb 2016
Deep Reinforcement Learning with Double Q-learning
H. V. Hasselt
A. Guez
David Silver
OffRL
167
7,639
0
22 Sep 2015
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
318
13,237
0
09 Sep 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
90
3,406
0
08 Jun 2015
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
James Martens
Roger C. Grosse
ODL
101
1,013
0
19 Mar 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,767
0
19 Feb 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.8K
150,039
0
22 Dec 2014
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
125
12,227
0
19 Dec 2013
Dynamic Policy Programming
M. G. Azar
Vicencc Gómez
H. Kappen
109
123
0
12 Apr 2010
1