Human Evaluation of Conversations is an Open Problem: comparing the
sensitivity of various methods for evaluating dialogue agents

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

12 January 2022

Eric Michael Smith

Jason Weston

Papers citing "Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents"

17 / 17 papers shown

Title
Mirror: Multimodal Cognitive Reframing Therapy for Rolling with Resistance Subin Kim Hoonrae Kim Jihyun Lee Yejin Jeon Gary Geunbae Lee OffRL 29 0 0 16 Apr 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts Helia Hashemi J. Eisner Corby Rosset Benjamin Van Durme Chris Kedzie 68 1 0 03 Jan 2025
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools Varun Magesh Faiz Surani Matthew Dahl Mirac Suzgun Christopher D. Manning Daniel E. Ho HILM ELM AILaw 27 66 0 30 May 2024
Psychological Metrics for Dialog System Evaluation Salvatore Giorgi Shreya Havaldar Farhan S. Ahmed Zuhaib Akhtar Shalaka Vaidya Gary Pan Pallavi V. Kulkarni H. A. Schwartz Joao Sedoc 22 2 0 24 May 2023
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs IokTong Lei Zhidong Deng ReLM RALM LRM 19 4 0 19 May 2023
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation Gibbeum Lee Volker Hartmann Jongho Park Dimitris Papailiopoulos Kangwook Lee 24 62 0 08 May 2023
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems Souvik Das Sougata Saha R. Srihari HILM 15 30 0 11 Jan 2023
Evaluating Human-Language Model Interaction Mina Lee Megha Srivastava Amelia Hardy John Thickstun Esin Durmus ... Hancheng Cao Tony Lee Rishi Bommasani Michael S. Bernstein Percy Liang LM&MA ALM 58 99 0 19 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment Chen Zhang L. F. D’Haro Qiquan Zhang Thomas Friedrichs Haizhou Li 26 7 0 18 Dec 2022
Keep Me Updated! Memory Management in Long-term Conversations Sanghwan Bae Donghyun Kwak Soyoung Kang Min Young Lee Sungdong Kim Yuin Jeong Hyeri Kim Sang-Woo Lee W. Park Nako Sung 40 46 0 17 Oct 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback Hua Lu Siqi Bao H. He Fan Wang Hua-Hong Wu Haifeng Wang ALM 20 18 0 30 Aug 2022
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems Weiwei Sun Shuyu Guo Shuo Zhang Pengjie Ren Zhumin Chen Maarten de Rijke Z. Ren ELM 25 5 0 02 Apr 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 30 21 0 18 Mar 2022
A Survey of NLP-Related Crowdsourcing HITs: what works and what does not Jessica Huynh Jeffrey P. Bigham M. Eskénazi 46 18 0 09 Nov 2021
Reason first, then respond: Modular Generation for Knowledge-infused Dialogue Leonard Adolphs Kurt Shuster Jack Urbanek Arthur Szlam Jason Weston KELM LRM 204 41 0 09 Nov 2021
Internet-Augmented Dialogue Generation M. Komeili Kurt Shuster Jason Weston RALM 238 280 0 15 Jul 2021
An Evaluation Protocol for Generative Conversational Systems Seolhwa Lee Heuiseok Lim Jo˜ao Sedoc ELM 35 10 0 24 Oct 2020