ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.01541
  4. Cited By
Deep Reinforcement Learning for Dialogue Generation

Deep Reinforcement Learning for Dialogue Generation

5 June 2016
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
ArXivPDFHTML

Papers citing "Deep Reinforcement Learning for Dialogue Generation"

50 / 165 papers shown
Title
On The Statistical Complexity of Offline Decision-Making
On The Statistical Complexity of Offline Decision-Making
Thanh Nguyen-Tang
R. Arora
OffRL
43
1
0
10 Jan 2025
A Static and Dynamic Attention Framework for Multi Turn Dialogue
  Generation
A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation
W. Zhang
Yiming Cui
Kaiyan Zhang
Yifa Wang
Qingfu Zhu
Lingzhi Li
Ting Liu
55
8
0
28 Oct 2024
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
OSLM
47
3
0
24 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
56
3
0
06 Oct 2024
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Yuxuan Yao
Han Wu
Mingyang Liu
Sichun Luo
Xiongwei Han
Jie Liu
Zhijiang Guo
Linqi Song
56
4
0
03 Oct 2024
Second Order Bounds for Contextual Bandits with Function Approximation
Second Order Bounds for Contextual Bandits with Function Approximation
Aldo Pacchiano
48
4
0
24 Sep 2024
Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
Hui Ma
Bo Zhang
Bo Xu
Jian Wang
Hongfei Lin
Xiao Sun
52
1
0
06 Aug 2024
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
Qiang Zhang
Jason Naradowsky
Yusuke Miyao
18
2
0
03 Aug 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
26
2
0
24 Jun 2024
CET2: Modelling Topic Transitions for Coherent and Engaging
  Knowledge-Grounded Conversations
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations
Lin Xu
Qixian Zhou
Jinlan Fu
See-Kiong Ng
34
0
0
04 Mar 2024
Runtime Verification of Learning Properties for Reinforcement Learning
  Algorithms
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
T. Mannucci
Julio de Oliveira Filho
OffRL
6
0
0
16 Nov 2023
Iteratively Learn Diverse Strategies with State Distance Information
Iteratively Learn Diverse Strategies with State Distance Information
Wei Fu
Weihua Du
Jingwei Li
Sunli Chen
Jingzhao Zhang
Yi Wu
43
3
0
23 Oct 2023
When is Agnostic Reinforcement Learning Statistically Tractable?
When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
22
5
0
09 Oct 2023
Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation
  via Attention Regularization
Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization
Helena Bonaldi
Giuseppe Attanasio
Debora Nozza
Marco Guerini
18
6
0
05 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
42
10
0
28 Aug 2023
Prompt-Based Length Controlled Generation with Reinforcement Learning
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
17
8
0
23 Aug 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
30
53
0
27 Jul 2023
Decision-Oriented Dialogue for Human-AI Collaboration
Decision-Oriented Dialogue for Human-AI Collaboration
Jessy Lin
Nicholas Tomlin
Jacob Andreas
J. Eisner
LLMAG
18
26
0
31 May 2023
A Framework for Incentivized Collaborative Learning
A Framework for Incentivized Collaborative Learning
Xinran Wang
Qi Le
Ahmad Faraz Khan
Jie Ding
A. Anwar
FedML
37
4
0
26 May 2023
Model-Based Simulation for Optimising Smart Reply
Model-Based Simulation for Optimising Smart Reply
Benjamin Towle
Ke Zhou
30
1
0
26 May 2023
Deep RL with Hierarchical Action Exploration for Dialogue Generation
Deep RL with Hierarchical Action Exploration for Dialogue Generation
Itsugun Cho
Ryota Takahashi
Yusaku Yanase
Hiroaki Saito
17
2
0
22 Mar 2023
Selective experience replay compression using coresets for lifelong deep
  reinforcement learning in medical imaging
Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging
Guangyao Zheng
Samson Zhou
Vladimir Braverman
M. Jacobs
V. Parekh
OffRL
CLL
11
3
0
22 Feb 2023
IC3: Image Captioning by Committee Consensus
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
26
17
0
02 Feb 2023
Gradient Imitation Reinforcement Learning for General Low-Resource
  Information Extraction
Gradient Imitation Reinforcement Learning for General Low-Resource Information Extraction
Xuming Hu
Shiao Meng
Chenwei Zhang
Xiangli Yang
Lijie Wen
Irwin King
Philip S. Yu
44
0
0
11 Nov 2022
Syntax-Aware On-the-Fly Code Completion
Syntax-Aware On-the-Fly Code Completion
Wannita Takerngsaksiri
C. Tantithamthavorn
Yuankui Li
24
17
0
09 Nov 2022
Active Countermeasures for Email Fraud
Active Countermeasures for Email Fraud
Wentao Chen
Fuzhou Wang
Matthew Edwards
20
5
0
26 Oct 2022
Reinforcement Learning and Bandits for Speech and Language Processing:
  Tutorial, Review and Outlook
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
Baihan Lin
OffRL
AI4TS
24
27
0
24 Oct 2022
Machine Generated Text: A Comprehensive Survey of Threat Models and
  Detection Methods
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Evan Crothers
Nathalie Japkowicz
H. Viktor
DeLMO
25
107
0
13 Oct 2022
Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence
Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence
Chris Callison-Burch
Gaurav Singh Tomar
Lara J. Martin
Daphne Ippolito
Suma Bailis
David Reitter
16
46
0
13 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
239
0
03 Oct 2022
Prompting for a conversation: How to control a dialog model?
Prompting for a conversation: How to control a dialog model?
Josef Valvoda
Yimai Fang
David Vandyke
56
5
0
22 Sep 2022
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain
  Chatbots
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
26
58
0
07 Sep 2022
CrossDial: An Entertaining Dialogue Dataset of Chinese Crosstalk
CrossDial: An Entertaining Dialogue Dataset of Chinese Crosstalk
Baizhou Huang
Shikang Du
Xiao-Yi Wan
14
0
0
03 Sep 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
15
49
0
01 Jun 2022
The Dialog Must Go On: Improving Visual Dialog via Generative
  Self-Training
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
22
10
0
25 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training
  Dialog Generation Models
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Bishal Santra
Ravi Ghadia
Manish Gupta
Pawan Goyal
OffRL
15
0
0
21 May 2022
DxFormer: A Decoupled Automatic Diagnostic System Based on
  Decoder-Encoder Transformer with Dense Symptom Representations
DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dense Symptom Representations
Wei Chen
Cheng Zhong
J. Peng
Zhongyu Wei
MedIm
23
18
0
08 May 2022
Knowledge Infused Decoding
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
25
14
0
06 Apr 2022
Using Pre-Trained Language Models for Producing Counter Narratives
  Against Hate Speech: a Comparative Study
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
Serra Sinem Tekiroğlu
Helena Bonaldi
Margherita Fanton
Marco Guerini
22
43
0
04 Apr 2022
Continuously Discovering Novel Strategies via Reward-Switching Policy
  Optimization
Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Zihan Zhou
Wei Fu
Bingliang Zhang
Yi Wu
15
28
0
04 Apr 2022
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained
  Language Model
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model
Fei Mi
Yitong Li
Yulong Zeng
Jingyan Zhou
Yasheng Wang
Chuanfei Xu
Lifeng Shang
Xin Jiang
Shiqi Zhao
Qun Liu
ALM
37
18
0
31 Mar 2022
A Well-Composed Text is Half Done! Composition Sampling for Diverse
  Conditional Generation
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan
Gonccalo Simoes
Yao-Min Zhao
Joshua Maynez
Dipanjan Das
Michael Collins
Mirella Lapata
24
30
0
28 Mar 2022
Long Time No See! Open-Domain Conversation with Long-Term Persona Memory
Long Time No See! Open-Domain Conversation with Long-Term Persona Memory
Xinchao Xu
Zhibin Gou
Wenquan Wu
Zheng-Yu Niu
Hua-Hong Wu
Haifeng Wang
Shihang Wang
RALM
25
107
0
11 Mar 2022
Reinforcement Learning for Linear Quadratic Control is Vulnerable Under
  Cost Manipulation
Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation
Yunhan Huang
Quanyan Zhu
OffRL
AAML
34
4
0
11 Mar 2022
Off-Policy Confidence Interval Estimation with Confounded Markov
  Decision Process
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
C. Shi
Jin Zhu
Ye Shen
S. Luo
Hong Zhu
R. Song
OffRL
21
30
0
22 Feb 2022
Reward Modeling for Mitigating Toxicity in Transformer-based Language
  Models
Reward Modeling for Mitigating Toxicity in Transformer-based Language Models
Farshid Faal
K. Schmitt
Jia Yuan Yu
11
25
0
19 Feb 2022
A Literature Survey of Recent Advances in Chatbots
A Literature Survey of Recent Advances in Chatbots
Guendalina Caldarini
Sardar F. Jaf
K. McGarry
AI4CE
27
274
0
17 Jan 2022
Differentially Private Regret Minimization in Episodic Markov Decision
  Processes
Differentially Private Regret Minimization in Episodic Markov Decision Processes
Sayak Ray Chowdhury
Xingyu Zhou
21
21
0
20 Dec 2021
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments
Emmanouil Zaranis
Georgios Paraskevopoulos
Athanasios Katsamanis
Alexandros Potamianos
25
9
0
30 Oct 2021
1234
Next