Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

24 July 2019

Papers citing "Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References"

22 / 22 papers shown

Title
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation Suvodip Dey M. Desarkar OffRL 46 0 0 20 Jan 2025
Psychological Metrics for Dialog System Evaluation Salvatore Giorgi Shreya Havaldar Farhan S. Ahmed Zuhaib Akhtar Shalaka Vaidya Gary Pan Pallavi V. Kulkarni H. Andrew Schwartz Joao Sedoc 24 2 0 24 May 2023
PAL: Persona-Augmented Emotional Support Conversation Generation Jiale Cheng Sahand Sabour Hao Sun Zhuang Chen Minlie Huang 37 28 0 19 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment Chen Zhang L. F. D’Haro Qiquan Zhang Thomas Friedrichs Haizhou Li 38 7 0 18 Dec 2022
There Is No Standard Answer: Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning Xueliang Zhao Tingchen Fu Chongyang Tao Rui Yan 20 4 0 22 Oct 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue Pengfei Zhang Xiao-fei Hu Kaidong Yu Jian Wang Song-Bo Han Cao Liu C. Yuan 27 7 0 19 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning Prakhar Gupta Cathy Jiao Yi-Ting Yeh Shikib Mehri M. Eskénazi Jeffrey P. Bigham ALM 46 47 0 25 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 34 21 0 18 Mar 2022
Ditch the Gold Standard: Re-evaluating Conversational Question Answering Huihan Li Tianyu Gao Manan Goenka Danqi Chen 24 21 0 16 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 22 36 0 03 Nov 2021
Investigating Robustness of Dialog Models to Popular Figurative Language Constructs Harsh Jhamtani Varun Gangal Eduard H. Hovy Taylor Berg-Kirkpatrick 28 21 0 01 Oct 2021
Enhancing Self-Disclosure In Neural Dialog Models By Candidate Re-ranking Mayank Soni Benjamin R. Cowan Vincent P. Wade 42 4 0 10 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches Xinmeng Li Wansen Wu Long Qin Quanjun Yin ELM 30 8 0 03 Aug 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation Prakhar Gupta Yulia Tsvetkov Jeffrey P. Bigham 47 22 0 10 Jun 2021
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances Zekang Li Jinchao Zhang Zhengcong Fei Yang Feng Jie Zhou 22 57 0 04 Jun 2021
HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations Weixin Liang Kai-Hui Liang Zhou Yu 45 15 0 01 Jun 2021
ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation Endri Kacupaj Barshana Banerjee Kuldeep Singh Jens Lehmann 31 17 0 13 Mar 2021
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale Ozan Caglayan Pranava Madhyastha Lucia Specia ELM 39 35 0 26 Oct 2020
Report from the NSF Future Directions Workshop, Toward User-Oriented Agents: Research Directions and Challenges M. Eskénazi Tiancheng Zhao LLMAG AI4TS AI4CE 36 9 0 10 Jun 2020
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation Weixin Liang James Zou Zhou Yu ELM 34 33 0 21 May 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation Shikib Mehri M. Eskénazi 17 219 0 01 May 2020
Deep Reinforcement Learning for Dialogue Generation Jiwei Li Will Monroe Alan Ritter Michel Galley Jianfeng Gao Dan Jurafsky 220 1,328 0 05 Jun 2016