Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

20 February 2023

Papers citing "Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems"

8 / 8 papers shown

Title
Are LLMs All You Need for Task-Oriented Dialogue? Vojtvech Hudevcek Ondrej Dusek 26 56 0 13 Apr 2023
Offline RL for Natural Language Generation with Implicit Language Q Learning Charles Burton Snell Ilya Kostrikov Yi Su Mengjiao Yang Sergey Levine OffRL 125 102 0 05 Jun 2022
Teaching language models to support answers with verified quotes Jacob Menick Maja Trebacz Vladimir Mikulik John Aslanides Francis Song ... Mia Glaese Susannah Young Lucy Campbell-Gillingham G. Irving Nat McAleese ELM RALM 246 259 0 21 Mar 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 319 11,953 0 04 Mar 2022
Bayesian Attention Modules Xinjie Fan Shujian Zhang Bo Chen Mingyuan Zhou 117 59 0 20 Oct 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems Sergey Levine Aviral Kumar George Tucker Justin Fu OffRL GP 340 1,960 0 04 May 2020
Efficient Intent Detection with Dual Sentence Encoders I. Casanueva Tadas Temvcinas D. Gerz Matthew Henderson Ivan Vulić VLM 180 453 0 10 Mar 2020
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 286 1,595 0 18 Sep 2019