ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.08023
  4. Cited By
How NOT To Evaluate Your Dialogue System: An Empirical Study of
  Unsupervised Evaluation Metrics for Dialogue Response Generation

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
ArXivPDFHTML

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 293 papers shown
Title
Probing Neural Dialog Models for Conversational Understanding
Probing Neural Dialog Models for Conversational Understanding
Abdelrhman Saleh
Tovly Deutsch
Stephen Casper
Yonatan Belinkov
Stuart M. Shieber
21
13
0
07 Jun 2020
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for
  Automatic Dialog Evaluation
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation
Weixin Liang
James Zou
Zhou Yu
ELM
34
33
0
21 May 2020
SueNes: A Weakly Supervised Approach to Evaluating Single-Document
  Summarization via Negative Sampling
SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling
F. S. Bao
Hebi Li
Ge Luo
Minghui Qiu
Yinfei Yang
Youbiao He
Cen Chen
24
4
0
13 May 2020
Response-Anticipated Memory for On-Demand Knowledge Integration in
  Response Generation
Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation
Zhiliang Tian
Wei Bi
Dongkyu Lee
Lanqing Xue
Yiping Song
Xiaojiang Liu
N. Zhang
27
25
0
13 May 2020
History for Visual Dialog: Do we really need it?
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
19
69
0
08 May 2020
FEQA: A Question Answering Evaluation Framework for Faithfulness
  Assessment in Abstractive Summarization
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
Esin Durmus
He He
Mona T. Diab
HILM
23
385
0
07 May 2020
Learning an Unreferenced Metric for Online Dialogue Evaluation
Learning an Unreferenced Metric for Online Dialogue Evaluation
Koustuv Sinha
Prasanna Parthasarathi
Jasmine Wang
Ryan J. Lowe
William L. Hamilton
Joelle Pineau
OffRL
29
84
0
01 May 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog
  Generation
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
Shikib Mehri
M. Eskénazi
17
219
0
01 May 2020
CDL: Curriculum Dual Learning for Emotion-Controllable Response
  Generation
CDL: Curriculum Dual Learning for Emotion-Controllable Response Generation
Lei Shen
Yang Feng
34
87
0
01 May 2020
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Doo Soon Kim
Trung Bui
Joongbo Shin
Kyomin Jung
24
0
0
01 May 2020
Question Rewriting for Conversational Question Answering
Question Rewriting for Conversational Question Answering
Svitlana Vakulenko
Shayne Longpre
Zhucheng Tu
R. Anantha
20
175
0
30 Apr 2020
Learning to Update Natural Language Comments Based on Code Changes
Learning to Update Natural Language Comments Based on Code Changes
Sheena Panthaplackel
Pengyu Nie
Miloš Gligorić
Junyi Jessy Li
Raymond J. Mooney
35
63
0
25 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
24
351
0
21 Apr 2020
A Survey of Document Grounded Dialogue Systems (DGDS)
A Survey of Document Grounded Dialogue Systems (DGDS)
Longxuan Ma
Weinan Zhang
Mingda Li
Ting Liu
32
19
0
17 Apr 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
46
1,450
0
09 Apr 2020
Asking and Answering Questions to Evaluate the Factual Consistency of
  Summaries
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
36
472
0
08 Apr 2020
A Survey on Conversational Recommender Systems
A Survey on Conversational Recommender Systems
Dietmar Jannach
A. Manzoor
Wanling Cai
Li Chen
18
405
0
01 Apr 2020
Variational Transformers for Diverse Response Generation
Variational Transformers for Diverse Response Generation
Zhaojiang Lin
Genta Indra Winata
Peng Xu
Zihan Liu
Pascale Fung
DRL
21
51
0
28 Mar 2020
XPersona: Evaluating Multilingual Personalized Chatbot
XPersona: Evaluating Multilingual Personalized Chatbot
Zhaojiang Lin
Zihan Liu
Genta Indra Winata
Samuel Cahyawijaya
Andrea Madotto
Yejin Bang
Etsuko Ishii
Pascale Fung
50
57
0
17 Mar 2020
Posterior-GAN: Towards Informative and Coherent Response Generation with
  Posterior Generative Adversarial Network
Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network
Shaoxiong Feng
Hongshen Chen
Kan Li
Dawei Yin
GAN
51
25
0
04 Mar 2020
A Neural Topical Expansion Framework for Unstructured Persona-oriented
  Dialogue Generation
A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation
Minghong Xu
Piji Li
Haoran Yang
Pengjie Ren
Zhaochun Ren
Zhumin Chen
Jun Ma
26
31
0
06 Feb 2020
Towards a Human-like Open-Domain Chatbot
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
42
924
0
27 Jan 2020
Paraphrase Generation with Latent Bag of Words
Paraphrase Generation with Latent Bag of Words
Yao Fu
Yansong Feng
John P. Cunningham
BDL
25
91
0
07 Jan 2020
Going Beneath the Surface: Evaluating Image Captioning for
  Grammaticality, Truthfulness and Diversity
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity
Huiyuan Xie
Tom Sherborne
A. Kuhnle
Ann A. Copestake
DiffM
25
9
0
19 Dec 2019
Knowledge-based Conversational Search
Knowledge-based Conversational Search
Svitlana Vakulenko
19
13
0
14 Dec 2019
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
58
944
0
04 Dec 2019
Task-Oriented Dialog Systems that Consider Multiple Appropriate
  Responses under the Same Context
Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context
Yichi Zhang
Zhijian Ou
Zhou Yu
27
182
0
24 Nov 2019
Social Bias Frames: Reasoning about Social and Power Implications of
  Language
Social Bias Frames: Reasoning about Social and Power Implications of Language
Maarten Sap
Saadia Gabriel
Lianhui Qin
Dan Jurafsky
Noah A. Smith
Yejin Choi
42
486
0
10 Nov 2019
Automatic Reminiscence Therapy for Dementia
Automatic Reminiscence Therapy for Dementia
Mariona Carós
M. Garolera
Petia Radeva
Xavier Giró-i-Nieto
27
40
0
25 Oct 2019
Unsupervised Context Rewriting for Open Domain Conversation
Unsupervised Context Rewriting for Open Domain Conversation
Kun Zhou
Kai Zhang
Yu Wu
Shujie Liu
Jingsong Yu
LRM
16
29
0
18 Oct 2019
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent
  Variable
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
33
268
0
17 Oct 2019
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue
  Response Models
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models
Tianxing He
Jun Liu
Kyunghyun Cho
Myle Ott
Bing-Quan Liu
James R. Glass
Fuchun Peng
CLL
35
9
0
16 Oct 2019
Learning from Fact-checkers: Analysis and Generation of Fact-checking
  Language
Learning from Fact-checkers: Analysis and Generation of Fact-checking Language
Nguyen Vo
Kyumin Lee
14
68
0
05 Oct 2019
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic
  Knowledge Graphs
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs
Yi-Lin Tuan
Yun-Nung Chen
Hung-yi Lee
21
71
0
01 Oct 2019
Do Massively Pretrained Language Models Make Better Storytellers?
Do Massively Pretrained Language Models Make Better Storytellers?
A. See
Aneesh S. Pappu
Rohun Saxena
Akhila Yerukola
Christopher D. Manning
45
166
0
24 Sep 2019
Counterfactual Story Reasoning and Generation
Counterfactual Story Reasoning and Generation
Lianhui Qin
Antoine Bosselut
Ari Holtzman
Chandra Bhagavatula
Elizabeth Clark
Yejin Choi
LRM
27
141
0
09 Sep 2019
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and
  Multi-turn Comparisons
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
Margaret Li
Jason Weston
Stephen Roller
31
176
0
06 Sep 2019
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Thomas Scialom
Sylvain Lamprier
Benjamin Piwowarski
Jacopo Staiano
27
149
0
04 Sep 2019
Linguistic Versus Latent Relations for Modeling Coherent Flow in
  Paragraphs
Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs
Dongyeop Kang
Hiroaki Hayashi
A. Black
Eduard H. Hovy
24
8
0
30 Aug 2019
Ensemble-Based Deep Reinforcement Learning for Chatbots
Ensemble-Based Deep Reinforcement Learning for Chatbots
Heriberto Cuayáhuitl
Donghyeon Lee
Seonghan Ryu
Yongjin Cho
Sungja Choi
Satish Reddy Indurthi
Seunghak Yu
Hyungtak Choi
Inchul Hwang
J. Kim
OffRL
23
69
0
27 Aug 2019
Deep Reinforcement Learning for Chatbots Using Clustered Actions and
  Human-Likeness Rewards
Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards
Heriberto Cuayáhuitl
Donghyeon Lee
Seonghan Ryu
Sungja Choi
Inchul Hwang
J. Kim
OffRL
42
6
0
27 Aug 2019
Deep Learning Based Chatbot Models
Deep Learning Based Chatbot Models
Richard Csaky
29
46
0
23 Aug 2019
A Multi-Turn Emotionally Engaging Dialog Model
A Multi-Turn Emotionally Engaging Dialog Model
Yubo Xie
Ekaterina Svikhnushina
P. Pu
16
15
0
15 Aug 2019
Fine-Grained Sentence Functions for Short-Text Conversation
Fine-Grained Sentence Functions for Short-Text Conversation
Wei Bi
Jun Gao
Xiaojiang Liu
Shuming Shi
14
15
0
24 Jul 2019
Deep Conversational Recommender in Travel
Deep Conversational Recommender in Travel
Lizi Liao
Ryuichi Takanobu
Yunshan Ma
Xun Yang
Minlie Huang
Tat-Seng Chua
BDL
21
45
0
25 Jun 2019
Conversational Response Re-ranking Based on Event Causality and Role
  Factored Tensor Event Embedding
Conversational Response Re-ranking Based on Event Causality and Role Factored Tensor Event Embedding
Shohei Tanaka
Koichiro Yoshino
Katsuhito Sudoh
Satoshi Nakamura
22
4
0
24 Jun 2019
Emotionally-Aware Chatbots: A Survey
Emotionally-Aware Chatbots: A Survey
Endang Wahyu Pamungkas
29
39
0
24 Jun 2019
DAL: Dual Adversarial Learning for Dialogue Generation
DAL: Dual Adversarial Learning for Dialogue Generation
Shaobo Cui
Rongzhong Lian
Di Jiang
Yuanfeng Song
Siqi Bao
Yong-jia Jiang
28
23
0
23 Jun 2019
Approximating Interactive Human Evaluation with Self-Play for
  Open-Domain Dialog Systems
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
Asma Ghandeharioun
J. Shen
Natasha Jaques
Craig Ferguson
Noah J. Jones
Àgata Lapedriza
Rosalind W. Picard
14
91
0
21 Jun 2019
Modeling Semantic Relationship in Multi-turn Conversations with
  Hierarchical Latent Variables
Modeling Semantic Relationship in Multi-turn Conversations with Hierarchical Latent Variables
Lei Shen
Yang Feng
Haolan Zhan
BDL
33
29
0
18 Jun 2019
Previous
123456
Next