Designing Precise and Robust Dialogue Response Evaluators

10 April 2020

Papers citing "Designing Precise and Robust Dialogue Response Evaluators"

15 / 15 papers shown

Title
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems Justin Vasselli Adam Nohejl Taro Watanabe AAML 49 0 0 12 Jan 2025
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues John Mendonça Isabel Trancoso A. Lavie 34 3 0 16 Jul 2024
Psychological Metrics for Dialog System Evaluation Salvatore Giorgi Shreya Havaldar Farhan S. Ahmed Zuhaib Akhtar Shalaka Vaidya Gary Pan Pallavi V. Kulkarni H. A. Schwartz Joao Sedoc 22 2 0 24 May 2023
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment Chen Zhang L. F. D’Haro Qiquan Zhang Thomas Friedrichs Haizhou Li 26 7 0 18 Dec 2022
State-of-the-art generalisation research in NLP: A taxonomy and review Dieuwke Hupkes Mario Giulianelli Verna Dankers Mikel Artetxe Yanai Elazar ... Leila Khalatbari Maria Ryskina Rita Frieske Ryan Cotterell Zhijing Jin 121 94 0 06 Oct 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue Pengfei Zhang Xiao-fei Hu Kaidong Yu Jian Wang Song-Bo Han Cao Liu C. Yuan 24 7 0 19 Jun 2022
Empathic Conversations: A Multi-level Dataset of Contextualized Conversations Damilola Omitaomu Shabnam Tafreshi Tingting Liu Sven Buechel Chris Callison-Burch J. Eichstaedt Lyle Ungar João Sedoc 44 48 0 25 May 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning Prakhar Gupta Cathy Jiao Yi-Ting Yeh Shikib Mehri M. Eskénazi Jeffrey P. Bigham ALM 44 47 0 25 May 2022
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 22 36 0 03 Nov 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization Lei Shen Haolan Zhan Xin Shen Hongshen Chen Xiaofang Zhao Xiao-Dan Zhu 38 17 0 14 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches Xinmeng Li Wansen Wu Long Qin Quanjun Yin ELM 30 8 0 03 Aug 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation Prakhar Gupta Yulia Tsvetkov Jeffrey P. Bigham 39 22 0 10 Jun 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey Jinjie Ni Tom Young Vlad Pandelea Fuzhao Xue Min Zhang 54 268 0 10 May 2021
$$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering$ $Q^{2}$ : Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering Or Honovich Leshem Choshen Roee Aharoni Ella Neeman Idan Szpektor Omri Abend HILM 27 138 0 16 Apr 2021
A Survey of Evaluation Metrics Used for NLG Systems Ananya B. Sai Akash Kumar Mohankumar Mitesh M. Khapra ELM 33 228 0 27 Aug 2020