ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.10342
  4. Cited By
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning
  for Task-oriented Dialogue Systems

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

20 February 2023
Yihao Feng
Shentao Yang
Shujian Zhang
Jianguo Zhang
Caiming Xiong
Mi Zhou
Haiquan Wang
    OffRL
ArXivPDFHTML

Papers citing "Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems"

8 / 8 papers shown
Title
Are LLMs All You Need for Task-Oriented Dialogue?
Are LLMs All You Need for Task-Oriented Dialogue?
Vojtvech Hudevcek
Ondrej Dusek
26
56
0
13 Apr 2023
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
125
102
0
05 Jun 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
246
259
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
Bayesian Attention Modules
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
117
59
0
20 Oct 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on
  Open Problems
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
340
1,960
0
04 May 2020
Efficient Intent Detection with Dual Sentence Encoders
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
180
453
0
10 Mar 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
286
1,595
0
18 Sep 2019
1