Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.00456
Cited By
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
30 June 2019
Natasha Jaques
Asma Ghandeharioun
J. Shen
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog"
7 / 107 papers shown
Title
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Serkan Cabi
Sergio Gomez Colmenarejo
Alexander Novikov
Ksenia Konyushkova
Scott E. Reed
...
David Barker
Jonathan Scholz
Misha Denil
Nando de Freitas
Ziyun Wang
OffRL
28
29
0
26 Sep 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,620
0
18 Sep 2019
Hierarchical Reinforcement Learning for Open-Domain Dialog
Abdelrhman Saleh
Natasha Jaques
Asma Ghandeharioun
J. Shen
Rosalind W. Picard
OffRL
14
59
0
17 Sep 2019
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
Asma Ghandeharioun
J. Shen
Natasha Jaques
Craig Ferguson
Noah J. Jones
Àgata Lapedriza
Rosalind W. Picard
14
91
0
21 Jun 2019
Dialogue Learning With Human-In-The-Loop
Jiwei Li
Alexander H. Miller
S. Chopra
MarcÁurelio Ranzato
Jason Weston
OffRL
227
134
0
29 Nov 2016
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
220
1,328
0
05 Jun 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
287
9,167
0
06 Jun 2015
Previous
1
2
3