ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.00456
  4. Cited By
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human
  Preferences in Dialog

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

30 June 2019
Natasha Jaques
Asma Ghandeharioun
J. Shen
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
    OffRL
ArXivPDFHTML

Papers citing "Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog"

7 / 107 papers shown
Title
Scaling data-driven robotics with reward sketching and batch
  reinforcement learning
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Serkan Cabi
Sergio Gomez Colmenarejo
Alexander Novikov
Ksenia Konyushkova
Scott E. Reed
...
David Barker
Jonathan Scholz
Misha Denil
Nando de Freitas
Ziyun Wang
OffRL
28
29
0
26 Sep 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,620
0
18 Sep 2019
Hierarchical Reinforcement Learning for Open-Domain Dialog
Hierarchical Reinforcement Learning for Open-Domain Dialog
Abdelrhman Saleh
Natasha Jaques
Asma Ghandeharioun
J. Shen
Rosalind W. Picard
OffRL
14
59
0
17 Sep 2019
Approximating Interactive Human Evaluation with Self-Play for
  Open-Domain Dialog Systems
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
Asma Ghandeharioun
J. Shen
Natasha Jaques
Craig Ferguson
Noah J. Jones
Àgata Lapedriza
Rosalind W. Picard
14
91
0
21 Jun 2019
Dialogue Learning With Human-In-The-Loop
Dialogue Learning With Human-In-The-Loop
Jiwei Li
Alexander H. Miller
S. Chopra
MarcÁurelio Ranzato
Jason Weston
OffRL
227
134
0
29 Nov 2016
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
220
1,328
0
05 Jun 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in
  Deep Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
287
9,167
0
06 Jun 2015
Previous
123