How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

25 March 2016

Papers citing "How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation"

50 / 292 papers shown

Title
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting Tomasz Korbak Hady ElSahar Germán Kruszewski Marc Dymetman CLL 27 51 0 01 Jun 2022
Commonsense and Named Entity Aware Knowledge Grounded Dialogue Generation Deeksha Varshney Akshara Prabhakar Asif Ekbal 29 18 0 27 May 2022
A Question-Answer Driven Approach to Reveal Affirmative Interpretations from Verbal Negations Md Mosharaf Hossain L. Holman Anusha Kakileti T. Kao N. Brito A. Mathews Eduardo Blanco 32 3 0 23 May 2022
Computational Storytelling and Emotions: A Survey Yusuke Mori Hiroaki Yamane Yusuke Mukuta Tatsuya Harada 45 2 0 23 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models Bishal Santra Ravi Ghadia Manish Gupta Pawan Goyal OffRL 23 0 0 21 May 2022
Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation Prakhar Gupta Harsh Jhamtani Jeffrey P. Bigham 49 12 0 19 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets Philippe Laban Chien-Sheng Wu Wenhao Liu Caiming Xiong 43 5 0 13 May 2022
Vector Representations of Idioms in Conversational Systems Tosin Adewumi F. Liwicki Marcus Liwicki 50 8 0 07 May 2022
Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation Yujie Xing Jason (Jinglun) Cai Nils Barlaug Peng Liu J. Gulla 31 4 0 05 May 2022
State-of-the-art in Open-domain Conversational AI: A Survey Tosin Adewumi F. Liwicki Marcus Liwicki 32 15 0 02 May 2022
COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas Chengshi Xu Pijian Li Wei Wang Haoran Yang Siyun Wang Chuangbai Xiao 38 26 0 02 May 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation Sarik Ghazarian Behnam Hedayatnia Alexandros Papangelis Yang Liu Dilek Z. Hakkani-Tür 30 19 0 25 Mar 2022
Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems Yi-Lin Tuan Sajjad Beygi Maryam Fazel-Zarandi Qiaozi Gao Alessandra Cervone William Yang Wang LRM 29 23 0 20 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 34 21 0 18 Mar 2022
RoMe: A Robust Metric for Evaluating Natural Language Generation Md. Rony Liubov Kovriguina Debanjan Chaudhuri Ricardo Usbeck Jens Lehmann 22 12 0 17 Mar 2022
Conversational Recommendation: A Grand AI Challenge Dietmar Jannach L. Chen 34 18 0 17 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems Jan Deriu Don Tuggener Pius von Daniken Mark Cieliebak AAML 19 9 0 28 Feb 2022
Rethinking and Refining the Distinct Metric Siyang Liu Sahand Sabour Yinhe Zheng Pei Ke Xiaoyan Zhu Minlie Huang 36 11 0 28 Feb 2022
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows Jianqiao Zhao Yanyang Li Wanyu Du Yangfeng Ji Dong Yu M. Lyu Liwei Wang 33 4 0 14 Feb 2022
Red Teaming Language Models with Language Models Ethan Perez Saffron Huang Francis Song Trevor Cai Roman Ring John Aslanides Amelia Glaese Nat McAleese G. Irving AAML 13 611 0 07 Feb 2022
Conversational Agents: Theory and Applications M. Wahde M. Virgolin LLMAG 32 25 0 07 Feb 2022
Towards Personalized Answer Generation in E-Commerce via Multi-Perspective Preference Modeling Yang Deng Yaliang Li Wenxuan Zhang Bolin Ding W. Lam 30 36 0 27 Dec 2021
Ditch the Gold Standard: Re-evaluating Conversational Question Answering Huihan Li Tianyu Gao Manan Goenka Danqi Chen 24 21 0 16 Dec 2021
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation Chen Zhang L. F. D’Haro Thomas Friedrichs Haizhou Li ELM 25 18 0 14 Dec 2021
Understanding and Improving the Exemplar-based Generation for Open-domain Conversation Seungju Han Beomsu Kim Seokjun Seo Enkhbayar Erdenee Buru Chang 36 3 0 13 Dec 2021
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity Kurt Shuster Jack Urbanek Arthur Szlam Jason Weston HILM 24 24 0 10 Dec 2021
CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning Teyun Kwon Anandha Gopalan 30 2 0 01 Dec 2021
Learning to Predict Persona Information forDialogue Personalization without Explicit Persona Description Wangchunshu Zhou Qifei Li Chenle Li 21 9 0 30 Nov 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 22 36 0 03 Nov 2021
A Systematic Investigation of Commonsense Knowledge in Large Language Models Xiang Lorraine Li A. Kuncoro Jordan Hoffmann Cyprien de Masson dÁutume Phil Blunsom Aida Nematzadeh LRM 25 58 0 31 Oct 2021
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments Emmanouil Zaranis Georgios Paraskevopoulos Athanasios Katsamanis Alexandros Potamianos 30 9 0 30 Oct 2021
I Do Not Understand What I Cannot Define: Automatic Question Generation With Pedagogically-Driven Content Selection Tim Steuer Anna Filighera Tobias Meuser Christoph Rensing 24 10 0 08 Oct 2021
Simulated Annealing for Emotional Dialogue Systems Chengzhang Dong Chenyang Huang Osmar Zaïane Lili Mou 34 5 0 22 Sep 2021
A Plug-and-Play Method for Controlled Text Generation Damian Pascual Béni Egressy Clara Meister Ryan Cotterell Roger Wattenhofer 27 89 0 20 Sep 2021
Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and Symbolic Logic Rules Forough Arabshahi Jennifer Lee Antoine Bosselut Yejin Choi Tom Michael Mitchell LRM 24 17 0 17 Sep 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization Lei Shen Haolan Zhan Xin Shen Hongshen Chen Xiaofang Zhao Xiao-Dan Zhu 43 17 0 14 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics Ananya B. Sai Tanay Dixit D. Y. Sheth S. Mohan Mitesh M. Khapra AAML 116 58 0 13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation Zechen Bai Yuta Nakashima Noa Garcia 68 43 0 13 Sep 2021
CEM: Commonsense-aware Empathetic Response Generation Sahand Sabour Chujie Zheng Minlie Huang 28 149 0 13 Sep 2021
Generating Personalized Dialogue via Multi-Task Meta-Learning Jing Yang Lee Kong Aik Lee W. Gan 33 14 0 07 Aug 2021
How to Evaluate Your Dialogue Models: A Review of Approaches Xinmeng Li Wansen Wu Long Qin Quanjun Yin ELM 30 8 0 03 Aug 2021
WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue Anant Khandelwal OffRL 24 6 0 01 Aug 2021
An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for Caregivers Lu Wang Munif Ishad Mujib Jake Williams G. Demiris Jina Huh-Yoo AI4MH 32 32 0 28 Jul 2021
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features Hannah Rashkin David Reitter Gaurav Singh Tomar Dipanjan Das 172 101 0 14 Jul 2021
Productivity, Portability, Performance: Data-Centric Python Yiheng Wang Yao Zhang Yanzhang Wang Yan Wan Jiao Wang Zhongyuan Wu Yuhao Yang Bowen She 56 95 0 01 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text Elizabeth Clark Tal August Sofia Serrano Nikita Haduong Suchin Gururangan Noah A. Smith DeLMO 54 398 0 30 Jun 2021
Do Encoder Representations of Generative Dialogue Models Encode Sufficient Information about the Task ? Prasanna Parthasarathi J. Pineau Sarath Chandar 13 2 0 20 Jun 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation Prakhar Gupta Yulia Tsvetkov Jeffrey P. Bigham 42 22 0 10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics Yi-Ting Yeh M. Eskénazi Shikib Mehri 36 105 0 07 Jun 2021
GTM: A Generative Triple-Wise Model for Conversational Question Generation Lei Shen Fandong Meng Jinchao Zhang Yang Feng Jie Zhou 19 13 0 07 Jun 2021