Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04349
Cited By
Learning to Score Behaviors for Guided Policy Optimization
11 June 2019
Aldo Pacchiano
Jack Parker-Holder
Yunhao Tang
A. Choromańska
K. Choromanski
Michael I. Jordan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to Score Behaviors for Guided Policy Optimization"
15 / 15 papers shown
Title
Iteratively Learn Diverse Strategies with State Distance Information
Wei Fu
Weihua Du
Jingwei Li
Sunli Chen
Jingzhao Zhang
Yi Wu
51
3
0
23 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
T. Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
36
47
0
06 Oct 2023
Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning
Haorui Li
Jiaqi Liang
Linjing Li
D. Zeng
11
0
0
02 Aug 2023
Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies
Hanna Ziesche
Leonel Rozo
26
5
0
17 May 2023
On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Tim G. J. Rudner
Cong Lu
Michael A. Osborne
Yarin Gal
Yee Whye Teh
OffRL
27
27
0
28 Dec 2022
Transfer RL via the Undo Maps Formalism
Abhi Gupta
Theodore H. Moskovitz
David Alvarez-Melis
Aldo Pacchiano
OffRL
33
0
0
26 Nov 2022
Learning General World Models in a Handful of Reward-Free Deployments
Yingchen Xu
Jack Parker-Holder
Aldo Pacchiano
Philip J. Ball
Oleh Rybkin
Stephen J. Roberts
Tim Rocktaschel
Edward Grefenstette
OffRL
55
8
0
23 Oct 2022
Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes
Mengdi Zhang
Hongyao Tang
Jianye Hao
Yan Zheng
OffRL
28
0
0
16 Sep 2022
Minimum Description Length Control
Theodore H. Moskovitz
Ta-Chu Kao
M. Sahani
M. Botvinick
26
1
0
17 Jul 2022
Agent Spaces
John C. Raisbeck
M. W. Allen
Hakho Lee
27
1
0
11 Nov 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
33
81
0
08 Nov 2021
Towards an Understanding of Default Policies in Multitask Policy Optimization
Theodore H. Moskovitz
Michael Arbel
Jack Parker-Holder
Aldo Pacchiano
25
9
0
04 Nov 2021
MICo: Improved representations via sampling-based state similarity for Markov decision processes
Pablo Samuel Castro
Tyler Kastner
Prakash Panangaden
Mark Rowland
43
35
0
03 Jun 2021
Differentiable Trust Region Layers for Deep Reinforcement Learning
Fabian Otto
P. Becker
Ngo Anh Vien
Hanna Ziesche
Gerhard Neumann
OffRL
41
19
0
22 Jan 2021
Primal Wasserstein Imitation Learning
Robert Dadashi
Léonard Hussenot
M. Geist
Olivier Pietquin
23
124
0
08 Jun 2020
1