How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 293 papers shown

Title
Probing Neural Dialog Models for Conversational Understanding Abdelrhman Saleh Tovly Deutsch Stephen Casper Yonatan Belinkov Stuart M. Shieber 21 13 0 07 Jun 2020
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation Weixin Liang James Zou Zhou Yu ELM 34 33 0 21 May 2020
SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling F. S. Bao Hebi Li Ge Luo Minghui Qiu Yinfei Yang Youbiao He Cen Chen 24 4 0 13 May 2020
Response-Anticipated Memory for On-Demand Knowledge Integration in Response Generation Zhiliang Tian Wei Bi Dongkyu Lee Lanqing Xue Yiping Song Xiaojiang Liu N. Zhang 27 25 0 13 May 2020
History for Visual Dialog: Do we really need it? Shubham Agarwal Trung Bui Joon-Young Lee Ioannis Konstas Verena Rieser VLM 19 69 0 08 May 2020
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization Esin Durmus He He Mona T. Diab HILM 23 385 0 07 May 2020
Learning an Unreferenced Metric for Online Dialogue Evaluation Koustuv Sinha Prasanna Parthasarathi Jasmine Wang Ryan J. Lowe William L. Hamilton Joelle Pineau OffRL 29 84 0 01 May 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation Shikib Mehri M. Eskénazi 17 219 0 01 May 2020
CDL: Curriculum Dual Learning for Emotion-Controllable Response Generation Lei Shen Yang Feng 34 87 0 01 May 2020
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights Hwanhee Lee Seunghyun Yoon Franck Dernoncourt Doo Soon Kim Trung Bui Joongbo Shin Kyomin Jung 24 0 0 01 May 2020
Question Rewriting for Conversational Question Answering Svitlana Vakulenko Shayne Longpre Zhucheng Tu R. Anantha 20 175 0 30 Apr 2020
Learning to Update Natural Language Comments Based on Code Changes Sheena Panthaplackel Pengyu Nie Miloš Gligorić Junyi Jessy Li Raymond J. Mooney 35 63 0 25 Apr 2020
Experience Grounds Language Yonatan Bisk Ari Holtzman Jesse Thomason Jacob Andreas Yoshua Bengio ... Angeliki Lazaridou Jonathan May Aleksandr Nisnevich Nicolas Pinto Joseph P. Turian 24 351 0 21 Apr 2020
A Survey of Document Grounded Dialogue Systems (DGDS) Longxuan Ma Weinan Zhang Mingda Li Ting Liu 32 19 0 17 Apr 2020
BLEURT: Learning Robust Metrics for Text Generation Thibault Sellam Dipanjan Das Ankur P. Parikh 46 1,450 0 09 Apr 2020
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries Alex Jinpeng Wang Kyunghyun Cho M. Lewis HILM 36 472 0 08 Apr 2020
A Survey on Conversational Recommender Systems Dietmar Jannach A. Manzoor Wanling Cai Li Chen 18 405 0 01 Apr 2020
Variational Transformers for Diverse Response Generation Zhaojiang Lin Genta Indra Winata Peng Xu Zihan Liu Pascale Fung DRL 21 51 0 28 Mar 2020
XPersona: Evaluating Multilingual Personalized Chatbot Zhaojiang Lin Zihan Liu Genta Indra Winata Samuel Cahyawijaya Andrea Madotto Yejin Bang Etsuko Ishii Pascale Fung 50 57 0 17 Mar 2020
Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network Shaoxiong Feng Hongshen Chen Kan Li Dawei Yin GAN 51 25 0 04 Mar 2020
A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation Minghong Xu Piji Li Haoran Yang Pengjie Ren Zhaochun Ren Zhumin Chen Jun Ma 26 31 0 06 Feb 2020
Towards a Human-like Open-Domain Chatbot Daniel De Freitas Minh-Thang Luong David R. So Jamie Hall Noah Fiedel ... Zi Yang Apoorv Kulshreshtha Gaurav Nemade Yifeng Lu Quoc V. Le 42 924 0 27 Jan 2020
Paraphrase Generation with Latent Bag of Words Yao Fu Yansong Feng John P. Cunningham BDL 25 91 0 07 Jan 2020
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity Huiyuan Xie Tom Sherborne A. Kuhnle Ann A. Copestake DiffM 25 9 0 19 Dec 2019
Knowledge-based Conversational Search Svitlana Vakulenko 19 13 0 14 Dec 2019
Plug and Play Language Models: A Simple Approach to Controlled Text Generation Sumanth Dathathri Andrea Madotto Janice Lan Jane Hung Eric Frank Piero Molino J. Yosinski Rosanne Liu KELM 58 944 0 04 Dec 2019
Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context Yichi Zhang Zhijian Ou Zhou Yu 27 182 0 24 Nov 2019
Social Bias Frames: Reasoning about Social and Power Implications of Language Maarten Sap Saadia Gabriel Lianhui Qin Dan Jurafsky Noah A. Smith Yejin Choi 42 486 0 10 Nov 2019
Automatic Reminiscence Therapy for Dementia Mariona Carós M. Garolera Petia Radeva Xavier Giró-i-Nieto 27 40 0 25 Oct 2019
Unsupervised Context Rewriting for Open Domain Conversation Kun Zhou Kai Zhang Yu Wu Shujie Liu Jingsong Yu LRM 16 29 0 18 Oct 2019
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable Siqi Bao H. He Fan Wang Hua Wu Haifeng Wang 33 268 0 17 Oct 2019
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models Tianxing He Jun Liu Kyunghyun Cho Myle Ott Bing-Quan Liu James R. Glass Fuchun Peng CLL 35 9 0 16 Oct 2019
Learning from Fact-checkers: Analysis and Generation of Fact-checking Language Nguyen Vo Kyumin Lee 14 68 0 05 Oct 2019
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs Yi-Lin Tuan Yun-Nung Chen Hung-yi Lee 21 71 0 01 Oct 2019
Do Massively Pretrained Language Models Make Better Storytellers? A. See Aneesh S. Pappu Rohun Saxena Akhila Yerukola Christopher D. Manning 45 166 0 24 Sep 2019
Counterfactual Story Reasoning and Generation Lianhui Qin Antoine Bosselut Ari Holtzman Chandra Bhagavatula Elizabeth Clark Yejin Choi LRM 27 141 0 09 Sep 2019
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons Margaret Li Jason Weston Stephen Roller 31 176 0 06 Sep 2019
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models Thomas Scialom Sylvain Lamprier Benjamin Piwowarski Jacopo Staiano 27 149 0 04 Sep 2019
Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs Dongyeop Kang Hiroaki Hayashi A. Black Eduard H. Hovy 24 8 0 30 Aug 2019
Ensemble-Based Deep Reinforcement Learning for Chatbots Heriberto Cuayáhuitl Donghyeon Lee Seonghan Ryu Yongjin Cho Sungja Choi Satish Reddy Indurthi Seunghak Yu Hyungtak Choi Inchul Hwang J. Kim OffRL 23 69 0 27 Aug 2019
Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards Heriberto Cuayáhuitl Donghyeon Lee Seonghan Ryu Sungja Choi Inchul Hwang J. Kim OffRL 42 6 0 27 Aug 2019
Deep Learning Based Chatbot Models Richard Csaky 29 46 0 23 Aug 2019
A Multi-Turn Emotionally Engaging Dialog Model Yubo Xie Ekaterina Svikhnushina P. Pu 16 15 0 15 Aug 2019
Fine-Grained Sentence Functions for Short-Text Conversation Wei Bi Jun Gao Xiaojiang Liu Shuming Shi 14 15 0 24 Jul 2019
Deep Conversational Recommender in Travel Lizi Liao Ryuichi Takanobu Yunshan Ma Xun Yang Minlie Huang Tat-Seng Chua BDL 21 45 0 25 Jun 2019
Conversational Response Re-ranking Based on Event Causality and Role Factored Tensor Event Embedding Shohei Tanaka Koichiro Yoshino Katsuhito Sudoh Satoshi Nakamura 22 4 0 24 Jun 2019
Emotionally-Aware Chatbots: A Survey Endang Wahyu Pamungkas 29 39 0 24 Jun 2019
DAL: Dual Adversarial Learning for Dialogue Generation Shaobo Cui Rongzhong Lian Di Jiang Yuanfeng Song Siqi Bao Yong-jia Jiang 28 23 0 23 Jun 2019
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems Asma Ghandeharioun J. Shen Natasha Jaques Craig Ferguson Noah J. Jones Àgata Lapedriza Rosalind W. Picard 14 91 0 21 Jun 2019
Modeling Semantic Relationship in Multi-turn Conversations with Hierarchical Latent Variables Lei Shen Yang Feng Haolan Zhan BDL 33 29 0 18 Jun 2019