Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining

23 September 2020

Ananya B. Sai

Akash Kumar Mohankumar

Siddharth Arora

Mitesh M. Khapra

ArXiv PDF HTML

Papers citing "Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining"

21 / 21 papers shown

Title
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation Suvodip Dey M. Desarkar OffRL 46 0 0 20 Jan 2025
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues John Mendonça Isabel Trancoso A. Lavie 39 3 0 16 Jul 2024
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation Kun Zhao Bohao Yang Chen Tang Chenghua Lin Liang Zhan 49 5 0 24 May 2024
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects Minqian Liu Ying Shen Zhiyang Xu Yixin Cao Eunah Cho Vaibhav Kumar Reza Ghanadan Lifu Huang ELM LM&MA ALM 52 25 0 15 Nov 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering Pei Ke Fei Huang Fei Mi Yasheng Wang Qun Liu Xiaoyan Zhu Minlie Huang ReLM ELM 41 10 0 13 Jul 2023
Pragmatically Appropriate Diversity for Dialogue Evaluation Katherine Stasaski Marti A. Hearst 27 1 0 06 Apr 2023
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment Chen Zhang L. F. D’Haro Qiquan Zhang Thomas Friedrichs Haizhou Li 33 7 0 18 Dec 2022
Pneg: Prompt-based Negative Response Generation for Dialogue Response Selection Task Nyoungwoo Lee chaeHun Park Ho-Jin Choi Jaegul Choo 35 6 0 31 Oct 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation Longxuan Ma Ziyu Zhuang Weinan Zhang Mingda Li Ting Liu 41 4 0 17 Aug 2022
Grounding in social media: An approach to building a chit-chat dialogue model Ritvik Choudhary Daisuke Kawahara 21 4 0 12 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning Prakhar Gupta Cathy Jiao Yi-Ting Yeh Shikib Mehri M. Eskénazi Jeffrey P. Bigham ALM 44 47 0 25 May 2022
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models Bishal Santra Ravi Ghadia Manish Gupta Pawan Goyal OffRL 23 0 0 21 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 34 21 0 18 Mar 2022
Ditch the Gold Standard: Re-evaluating Conversational Question Answering Huihan Li Tianyu Gao Manan Goenka Danqi Chen 24 21 0 16 Dec 2021
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation Chen Zhang L. F. D’Haro Thomas Friedrichs Haizhou Li ELM 25 18 0 14 Dec 2021
Representation Learning for Conversational Data using Discourse Mutual Information Maximization Bishal Santra Sumegh Roychowdhury Aishik Mandal Vasu Gurram Atharva Naik Manish Gupta Pawan Goyal SSL 27 4 0 04 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 22 36 0 03 Nov 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics Ananya B. Sai Tanay Dixit D. Y. Sheth S. Mohan Mitesh M. Khapra AAML 116 58 0 13 Sep 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation Prakhar Gupta Yulia Tsvetkov Jeffrey P. Bigham 47 22 0 10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics Yi-Ting Yeh M. Eskénazi Shikib Mehri 36 105 0 07 Jun 2021
A Survey of Evaluation Metrics Used for NLG Systems Ananya B. Sai Akash Kumar Mohankumar Mitesh M. Khapra ELM 33 230 0 27 Aug 2020