Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.04207
Cited By
v1
v2
v3 (latest)
Multi-Path Policy Optimization
11 November 2019
L. Pan
Qingpeng Cai
Longbo Huang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multi-Path Policy Optimization"
39 / 39 papers shown
Title
Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies
M. A. Masood
Finale Doshi-Velez
42
51
0
31 May 2019
Collaborative Evolutionary Reinforcement Learning
Shauharda Khadka
Somdeb Majumdar
Tarek Nassar
Zach Dwiel
E. Tumer
Santiago Miret
Yinyin Liu
Kagan Tumer
50
100
0
02 May 2019
Malthusian Reinforcement Learning
Joel Z Leibo
Julien Perolat
Edward Hughes
S. Wheelwright
Adam H. Marblestone
Edgar A. Duénez-Guzmán
P. Sunehag
Iain Dunning
T. Graepel
AI4CE
70
37
0
17 Dec 2018
Genetic-Gated Networks for Deep Reinforcement
Simyung Chang
John Yang
Jaeseok Choi
Nojun Kwak
AI4CE
34
16
0
26 Nov 2018
ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search
Gary Cheng
Kannan Ramchandran
L. Ghaoui
43
24
0
06 Nov 2018
CEM-RL: Combining evolutionary and gradient-based methods for policy search
Aloïs Pourchot
Olivier Sigaud
74
161
0
02 Oct 2018
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Jacob Buckman
Danijar Hafner
George Tucker
E. Brevdo
Honglak Lee
91
332
0
04 Jul 2018
Learning Self-Imitating Diverse Policies
Tanmay Gangwani
Qiang Liu
Jian Peng
63
67
0
25 May 2018
Evolution-Guided Policy Gradient in Reinforcement Learning
Shauharda Khadka
Kagan Tumer
114
228
0
21 May 2018
Policy Search in Continuous Action Domains: an Overview
Olivier Sigaud
F. Stulp
36
72
0
13 Mar 2018
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
180
5,204
0
26 Feb 2018
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
Cédric Colas
Olivier Sigaud
Pierre-Yves Oudeyer
61
159
0
14 Feb 2018
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Zhang-Wei Hong
Tzu-Yun Shann
Shih-Yang Su
Yi-Hsiang Chang
Chun-Yi Lee
62
124
0
13 Feb 2018
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Ofir Nachum
Yinlam Chow
Mohammad Ghavamzadeh
64
45
0
10 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
311
8,352
0
04 Jan 2018
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
F. Such
Vashisht Madhavan
Edoardo Conti
Joel Lehman
Kenneth O. Stanley
Jeff Clune
99
692
0
18 Dec 2017
Population Based Training of Neural Networks
Max Jaderberg
Valentin Dalibard
Simon Osindero
Wojciech M. Czarnecki
Jeff Donahue
...
Tim Green
Iain Dunning
Karen Simonyan
Chrisantha Fernando
Koray Kavukcuoglu
79
743
0
27 Nov 2017
Deep Reinforcement Learning that Matters
Peter Henderson
Riashat Islam
Philip Bachman
Joelle Pineau
Doina Precup
David Meger
OffRL
118
1,961
0
19 Sep 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
517
19,065
0
20 Jul 2017
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
71
106
0
06 Jul 2017
Noisy Networks for Exploration
Meire Fortunato
M. G. Azar
Bilal Piot
Jacob Menick
Ian Osband
...
Rémi Munos
Demis Hassabis
Olivier Pietquin
Charles Blundell
Shane Legg
79
897
0
30 Jun 2017
Parameter Space Noise for Exploration
Matthias Plappert
Rein Houthooft
Prafulla Dhariwal
Szymon Sidor
Richard Y. Chen
Xi Chen
Tamim Asfour
Pieter Abbeel
Marcin Andrychowicz
57
597
0
06 Jun 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Bernhard Schölkopf
Sergey Levine
OffRL
76
165
0
01 Jun 2017
Curiosity-driven Exploration by Self-supervised Prediction
Deepak Pathak
Pulkit Agrawal
Alexei A. Efros
Trevor Darrell
LRM
SSL
113
2,439
0
15 May 2017
Stein Variational Policy Gradient
Yang Liu
Prajit Ramachandran
Qiang Liu
Jian-wei Peng
69
140
0
07 Apr 2017
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Tim Salimans
Jonathan Ho
Xi Chen
Szymon Sidor
Ilya Sutskever
92
1,541
0
10 Mar 2017
Count-Based Exploration with Neural Density Models
Georg Ostrovski
Marc G. Bellemare
Aaron van den Oord
Rémi Munos
84
625
0
03 Mar 2017
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
Justin Fu
John D. Co-Reyes
Sergey Levine
OffRL
55
155
0
03 Mar 2017
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
110
1,340
0
27 Feb 2017
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Haoran Tang
Rein Houthooft
Davis Foote
Adam Stooke
Xi Chen
Yan Duan
John Schulman
F. Turck
Pieter Abbeel
OffRL
94
773
0
15 Nov 2016
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
OffRL
BDL
88
345
0
07 Nov 2016
Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
Oron Anschel
Nir Baram
N. Shimkin
78
317
0
07 Nov 2016
Unifying Count-Based Exploration and Intrinsic Motivation
Marc G. Bellemare
S. Srinivasan
Georg Ostrovski
Tom Schaul
D. Saxton
Rémi Munos
176
1,478
0
06 Jun 2016
Benchmarking Deep Reinforcement Learning for Continuous Control
Yan Duan
Xi Chen
Rein Houthooft
John Schulman
Pieter Abbeel
OffRL
82
1,695
0
22 Apr 2016
Deep Exploration via Bootstrapped DQN
Ian Osband
Charles Blundell
Alexander Pritzel
Benjamin Van Roy
121
1,309
0
15 Feb 2016
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
202
8,859
0
04 Feb 2016
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
320
13,272
0
09 Sep 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
104
3,414
0
08 Jun 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,793
0
19 Feb 2015
1