Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

30 June 2019

Papers citing "Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog"

7 / 107 papers shown

Title
Scaling data-driven robotics with reward sketching and batch reinforcement learning Serkan Cabi Sergio Gomez Colmenarejo Alexander Novikov Ksenia Konyushkova Scott E. Reed ... David Barker Jonathan Scholz Misha Denil Nando de Freitas Ziyun Wang OffRL 28 29 0 26 Sep 2019
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 301 1,620 0 18 Sep 2019
Hierarchical Reinforcement Learning for Open-Domain Dialog Abdelrhman Saleh Natasha Jaques Asma Ghandeharioun J. Shen Rosalind W. Picard OffRL 14 59 0 17 Sep 2019
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems Asma Ghandeharioun J. Shen Natasha Jaques Craig Ferguson Noah J. Jones Àgata Lapedriza Rosalind W. Picard 14 91 0 21 Jun 2019
Dialogue Learning With Human-In-The-Loop Jiwei Li Alexander H. Miller S. Chopra MarcÁurelio Ranzato Jason Weston OffRL 227 134 0 29 Nov 2016
Deep Reinforcement Learning for Dialogue Generation Jiwei Li Will Monroe Alan Ritter Michel Galley Jianfeng Gao Dan Jurafsky 220 1,328 0 05 Jun 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 287 9,167 0 06 Jun 2015