Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.10592
Cited By
Model-Ensemble Trust-Region Policy Optimization
28 February 2018
Thanard Kurutach
I. Clavera
Yan Duan
Aviv Tamar
Pieter Abbeel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Model-Ensemble Trust-Region Policy Optimization"
27 / 27 papers shown
Title
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
402
5
0
10 Mar 2025
Scalable Model Merging with Progressive Layer-wise Distillation
Jing Xu
Jiazheng Li
J.N. Zhang
MoMe
FedML
271
2
0
18 Feb 2025
On Reward Transferability in Adversarial Inverse Reinforcement Learning: Insights from Random Matrix Theory
Yangchun Zhang
Wang Zhou
Yirui Zhou
75
0
0
31 Dec 2024
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
Phillip Swazinna
Steffen Udluft
Thomas Runkler
OffRL
50
83
0
12 Aug 2020
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles
Aditya Modi
Nan Jiang
Ambuj Tewari
Satinder Singh
61
131
0
23 Oct 2019
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Anusha Nagabandi
G. Kahn
R. Fearing
Sergey Levine
91
973
0
08 Aug 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
444
18,931
0
20 Jul 2017
Noisy Networks for Exploration
Meire Fortunato
M. G. Azar
Bilal Piot
Jacob Menick
Ian Osband
...
Rémi Munos
Demis Hassabis
Olivier Pietquin
Charles Blundell
Shane Legg
79
893
0
30 Jun 2017
Parameter Space Noise for Exploration
Matthias Plappert
Rein Houthooft
Prafulla Dhariwal
Szymon Sidor
Richard Y. Chen
Xi Chen
Tamim Asfour
Pieter Abbeel
Marcin Andrychowicz
52
595
0
06 Jun 2017
Prediction and Control with Temporal Segment Models
Nikhil Mishra
Pieter Abbeel
Igor Mordatch
BDL
51
64
0
12 Mar 2017
Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation
Ashvin Nair
Dian Chen
Pulkit Agrawal
Phillip Isola
Pieter Abbeel
Jitendra Malik
Sergey Levine
SSL
50
311
0
06 Mar 2017
Deep Visual Foresight for Planning Robot Motion
Chelsea Finn
Sergey Levine
111
783
0
03 Oct 2016
Learning to Poke by Poking: Experiential Learning of Intuitive Physics
Pulkit Agrawal
Ashvin Nair
Pieter Abbeel
Jitendra Malik
Sergey Levine
SSL
62
563
0
23 Jun 2016
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks
Stefan Depeweg
José Miguel Hernández-Lobato
Finale Doshi-Velez
Steffen Udluft
BDL
52
159
0
23 May 2016
Benchmarking Deep Reinforcement Learning for Continuous Control
Yan Duan
Xi Chen
Rein Houthooft
John Schulman
Pieter Abbeel
OffRL
76
1,693
0
22 Apr 2016
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
Sergey Levine
P. Pastor
A. Krizhevsky
Deirdre Quillen
160
2,071
0
07 Mar 2016
Learning Continuous Control Policies by Stochastic Value Gradients
N. Heess
Greg Wayne
David Silver
Timothy Lillicrap
Yuval Tassa
Tom Erez
95
560
0
30 Oct 2015
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
Justin Fu
Sergey Levine
Pieter Abbeel
OffRL
59
159
0
23 Sep 2015
Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours
Lerrel Pinto
Abhinav Gupta
SSL
94
1,152
0
23 Sep 2015
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
310
13,214
0
09 Sep 2015
Action-Conditional Video Prediction using Deep Networks in Atari Games
Junhyuk Oh
Xiaoxiao Guo
Honglak Lee
Richard L. Lewis
Satinder Singh
103
852
0
31 Jul 2015
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
Manuel Watter
Jost Tobias Springenberg
Joschka Boedecker
Martin Riedmiller
BDL
63
844
0
24 Jun 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
82
3,399
0
08 Jun 2015
End-to-End Training of Deep Visuomotor Policies
Sergey Levine
Chelsea Finn
Trevor Darrell
Pieter Abbeel
BDL
288
3,431
0
02 Apr 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
274
6,755
0
19 Feb 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.6K
149,842
0
22 Dec 2014
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu
Tomas Mikolov
Yoshua Bengio
ODL
182
5,334
0
21 Nov 2012
1