ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.08023
  4. Cited By
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
ArXivPDFHTML

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 292 papers shown
Title
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
27
51
0
01 Jun 2022
Commonsense and Named Entity Aware Knowledge Grounded Dialogue
  Generation
Commonsense and Named Entity Aware Knowledge Grounded Dialogue Generation
Deeksha Varshney
Akshara Prabhakar
Asif Ekbal
29
18
0
27 May 2022
A Question-Answer Driven Approach to Reveal Affirmative Interpretations
  from Verbal Negations
A Question-Answer Driven Approach to Reveal Affirmative Interpretations from Verbal Negations
Md Mosharaf Hossain
L. Holman
Anusha Kakileti
T. Kao
N. Brito
A. Mathews
Eduardo Blanco
32
3
0
23 May 2022
Computational Storytelling and Emotions: A Survey
Computational Storytelling and Emotions: A Survey
Yusuke Mori
Hiroaki Yamane
Yusuke Mukuta
Tatsuya Harada
45
2
0
23 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training
  Dialog Generation Models
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
Bishal Santra
Ravi Ghadia
Manish Gupta
Pawan Goyal
OffRL
23
0
0
21 May 2022
Target-Guided Dialogue Response Generation Using Commonsense and Data
  Augmentation
Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation
Prakhar Gupta
Harsh Jhamtani
Jeffrey P. Bigham
49
12
0
19 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
43
5
0
13 May 2022
Vector Representations of Idioms in Conversational Systems
Vector Representations of Idioms in Conversational Systems
Tosin Adewumi
F. Liwicki
Marcus Liwicki
50
8
0
07 May 2022
Balancing Multi-Domain Corpora Learning for Open-Domain Response
  Generation
Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation
Yujie Xing
Jason (Jinglun) Cai
Nils Barlaug
Peng Liu
J. Gulla
31
4
0
05 May 2022
State-of-the-art in Open-domain Conversational AI: A Survey
State-of-the-art in Open-domain Conversational AI: A Survey
Tosin Adewumi
F. Liwicki
Marcus Liwicki
32
15
0
02 May 2022
COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both
  Party Personas
COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas
Chengshi Xu
Pijian Li
Wei Wang
Haoran Yang
Siyun Wang
Chuangbai Xiao
38
26
0
02 May 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
  Evaluation
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian
Behnam Hedayatnia
Alexandros Papangelis
Yang Liu
Dilek Z. Hakkani-Tür
30
19
0
25 Mar 2022
Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue
  Systems
Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems
Yi-Lin Tuan
Sajjad Beygi
Maryam Fazel-Zarandi
Qiaozi Gao
Alessandra Cervone
William Yang Wang
LRM
29
23
0
20 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
34
21
0
18 Mar 2022
RoMe: A Robust Metric for Evaluating Natural Language Generation
RoMe: A Robust Metric for Evaluating Natural Language Generation
Md. Rony
Liubov Kovriguina
Debanjan Chaudhuri
Ricardo Usbeck
Jens Lehmann
22
12
0
17 Mar 2022
Conversational Recommendation: A Grand AI Challenge
Conversational Recommendation: A Grand AI Challenge
Dietmar Jannach
L. Chen
34
18
0
17 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue
  Systems
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Jan Deriu
Don Tuggener
Pius von Daniken
Mark Cieliebak
AAML
19
9
0
28 Feb 2022
Rethinking and Refining the Distinct Metric
Rethinking and Refining the Distinct Metric
Siyang Liu
Sahand Sabour
Yinhe Zheng
Pei Ke
Xiaoyan Zhu
Minlie Huang
36
11
0
28 Feb 2022
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
  Act Flows
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows
Jianqiao Zhao
Yanyang Li
Wanyu Du
Yangfeng Ji
Dong Yu
M. Lyu
Liwei Wang
33
4
0
14 Feb 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
611
0
07 Feb 2022
Conversational Agents: Theory and Applications
Conversational Agents: Theory and Applications
M. Wahde
M. Virgolin
LLMAG
32
25
0
07 Feb 2022
Towards Personalized Answer Generation in E-Commerce via
  Multi-Perspective Preference Modeling
Towards Personalized Answer Generation in E-Commerce via Multi-Perspective Preference Modeling
Yang Deng
Yaliang Li
Wenxuan Zhang
Bolin Ding
W. Lam
30
36
0
27 Dec 2021
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Huihan Li
Tianyu Gao
Manan Goenka
Danqi Chen
24
21
0
16 Dec 2021
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue
  Evaluation
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
25
18
0
14 Dec 2021
Understanding and Improving the Exemplar-based Generation for
  Open-domain Conversation
Understanding and Improving the Exemplar-based Generation for Open-domain Conversation
Seungju Han
Beomsu Kim
Seokjun Seo
Enkhbayar Erdenee
Buru Chang
36
3
0
13 Dec 2021
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an
  Identity
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity
Kurt Shuster
Jack Urbanek
Arthur Szlam
Jason Weston
HILM
24
24
0
10 Dec 2021
CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning
CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning
Teyun Kwon
Anandha Gopalan
30
2
0
01 Dec 2021
Learning to Predict Persona Information forDialogue Personalization
  without Explicit Persona Description
Learning to Predict Persona Information forDialogue Personalization without Explicit Persona Description
Wangchunshu Zhou
Qifei Li
Chenle Li
21
9
0
30 Nov 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
22
36
0
03 Nov 2021
A Systematic Investigation of Commonsense Knowledge in Large Language
  Models
A Systematic Investigation of Commonsense Knowledge in Large Language Models
Xiang Lorraine Li
A. Kuncoro
Jordan Hoffmann
Cyprien de Masson dÁutume
Phil Blunsom
Aida Nematzadeh
LRM
25
58
0
31 Oct 2021
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments
Emmanouil Zaranis
Georgios Paraskevopoulos
Athanasios Katsamanis
Alexandros Potamianos
30
9
0
30 Oct 2021
I Do Not Understand What I Cannot Define: Automatic Question Generation
  With Pedagogically-Driven Content Selection
I Do Not Understand What I Cannot Define: Automatic Question Generation With Pedagogically-Driven Content Selection
Tim Steuer
Anna Filighera
Tobias Meuser
Christoph Rensing
24
10
0
08 Oct 2021
Simulated Annealing for Emotional Dialogue Systems
Simulated Annealing for Emotional Dialogue Systems
Chengzhang Dong
Chenyang Huang
Osmar Zaïane
Lili Mou
34
5
0
22 Sep 2021
A Plug-and-Play Method for Controlled Text Generation
A Plug-and-Play Method for Controlled Text Generation
Damian Pascual
Béni Egressy
Clara Meister
Ryan Cotterell
Roger Wattenhofer
27
89
0
20 Sep 2021
Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and
  Symbolic Logic Rules
Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and Symbolic Logic Rules
Forough Arabshahi
Jennifer Lee
Antoine Bosselut
Yejin Choi
Tom Michael Mitchell
LRM
24
17
0
17 Sep 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain
  Dialogues with Bayesian Optimization
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization
Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiao-Dan Zhu
43
17
0
14 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
116
58
0
13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description
  Generation
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Zechen Bai
Yuta Nakashima
Noa Garcia
68
43
0
13 Sep 2021
CEM: Commonsense-aware Empathetic Response Generation
CEM: Commonsense-aware Empathetic Response Generation
Sahand Sabour
Chujie Zheng
Minlie Huang
28
149
0
13 Sep 2021
Generating Personalized Dialogue via Multi-Task Meta-Learning
Generating Personalized Dialogue via Multi-Task Meta-Learning
Jing Yang Lee
Kong Aik Lee
W. Gan
33
14
0
07 Aug 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
30
8
0
03 Aug 2021
WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
  for Multi-turn Dialogue
WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue
Anant Khandelwal
OffRL
24
6
0
01 Aug 2021
An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for
  Caregivers
An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for Caregivers
Lu Wang
Munif Ishad Mujib
Jake Williams
G. Demiris
Jina Huh-Yoo
AI4MH
32
32
0
28 Jul 2021
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable
  Features
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin
David Reitter
Gaurav Singh Tomar
Dipanjan Das
172
101
0
14 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
56
95
0
01 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
54
398
0
30 Jun 2021
Do Encoder Representations of Generative Dialogue Models Encode
  Sufficient Information about the Task ?
Do Encoder Representations of Generative Dialogue Models Encode Sufficient Information about the Task ?
Prasanna Parthasarathi
J. Pineau
Sarath Chandar
13
2
0
20 Jun 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking
  and Evaluation
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation
Prakhar Gupta
Yulia Tsvetkov
Jeffrey P. Bigham
42
22
0
10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
36
105
0
07 Jun 2021
GTM: A Generative Triple-Wise Model for Conversational Question
  Generation
GTM: A Generative Triple-Wise Model for Conversational Question Generation
Lei Shen
Fandong Meng
Jinchao Zhang
Yang Feng
Jie Zhou
19
13
0
07 Jun 2021
Previous
123456
Next