ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.14457
  4. Cited By
Rewarding What Matters: Step-by-Step Reinforcement Learning for
  Task-Oriented Dialogue

Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

20 June 2024
Huifang Du
Shuqin Li
Minghao Wu
Xuejing Feng
Yuan-Fang Li
Haofen Wang
    OffRL
ArXivPDFHTML

Papers citing "Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue"

15 / 15 papers shown
Title
Behavior Alignment via Reward Function Optimization
Behavior Alignment via Reward Function Optimization
Dhawal Gupta
Yash Chandak
Scott M. Jordan
Philip S. Thomas
Bruno Castro da Silva
53
10
0
29 Oct 2023
Are LLMs All You Need for Task-Oriented Dialogue?
Are LLMs All You Need for Task-Oriented Dialogue?
Vojtvech Hudevcek
Ondrej Dusek
44
60
0
13 Apr 2023
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
74
244
0
03 Oct 2022
SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog
  Understanding and Generation
SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation
Wanwei He
Yinpei Dai
Min Yang
Jian Sun
Fei Huang
Luo Si
Yongbin Li
48
62
0
14 Sep 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
Yixuan Su
Lei Shu
Elman Mansimov
Arshit Gupta
Deng Cai
Yi-An Lai
Yi Zhang
174
192
0
29 Sep 2021
UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2
UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2
Yunyi Yang
Yunhao Li
Xiaojun Quan
61
190
0
07 Dec 2020
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Zhaojiang Lin
Andrea Madotto
Genta Indra Winata
Pascale Fung
64
172
0
25 Sep 2020
GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
Jianfeng Liu
Feiyang Pan
Ling Luo
OffRL
39
23
0
24 May 2020
Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced
  Question Answering
Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced Question Answering
Li Zhou
Kevin Small
49
85
0
07 Nov 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
434
1,664
0
18 Sep 2019
Transferable Multi-Domain State Generator for Task-Oriented Dialogue
  Systems
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
Chien-Sheng Wu
Andrea Madotto
Ehsan Hosseini-Asl
Caiming Xiong
R. Socher
Pascale Fung
73
434
0
21 May 2019
Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for
  Task-Completion Dialogue Policy Learning
Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
Yuexin Wu
Xiujun Li
Jingjing Liu
Jianfeng Gao
Yiming Yang
49
43
0
19 Nov 2018
MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for
  Task-Oriented Dialogue Modelling
MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Paweł Budzianowski
Tsung-Hsien Wen
Bo-Hsiang Tseng
I. Casanueva
Stefan Ultes
Osman Ramadan
Milica Gasic
144
1,306
0
29 Sep 2018
Neural Approaches to Conversational AI
Neural Approaches to Conversational AI
Jianfeng Gao
Michel Galley
Lihong Li
76
672
0
21 Sep 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
288
18,685
0
20 Jul 2017
1