Survey on Evaluation Methods for Dialogue Systems

10 May 2019

Papers citing "Survey on Evaluation Methods for Dialogue Systems"

50 / 51 papers shown

Title
LLMs Get Lost In Multi-Turn Conversation Philippe Laban Hiroaki Hayashi Yingbo Zhou Jennifer Neville 50 1 0 09 May 2025
Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems Mikey Elmers Koji Inoue Divesh Lala Keiko Ochi Tatsuya Kawahara 23 1 0 04 Oct 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations Lichao Zhang Jia Yu Shuai Zhang Long Li Yangyang Zhong ... Fangsheng Weng Fayu Pan Jing Li Renjun Xu Zhenzhong Lan 32 4 0 21 Jun 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems Clemencia Siro Mohammad Aliannejadi Maarten de Rijke 43 3 0 15 Apr 2024
Apollonion: Profile-centric Dialog Agent Shangyu Chen Zibo Zhao Yuanyuan Zhao Xiang Li LLMAG 40 1 0 10 Apr 2024
Token Trails: Navigating Contextual Depths in Conversational AI with ChatLLM Md. Kowsher Ritesh Panditi Nusrat Jahan Prottasha Prakash Bhat A. Bairagi M. Arefin 28 1 0 03 Apr 2024
Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues Armand Stricker P. Paroubek 41 3 0 23 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges Mingqi Gao Xinyu Hu Jie Ruan Xiao Pu Xiaojun Wan ELM LM&MA 65 29 0 02 Feb 2024
Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets Armand Stricker P. Paroubek 34 0 0 23 Nov 2023
Evaluating Chatbots to Promote Users' Trust -- Practices and Open Problems Biplav Srivastava Kausik Lakkaraju T. Koppel Vignesh Narayanan Ashish Kundu Sachindra Joshi 34 2 0 09 Sep 2023
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models Qingyue Wang Y. Fu Yanan Cao Zhiliang Tian Shi Wang Dacheng Tao LLMAG KELM RALM 67 24 0 29 Aug 2023
GPT Self-Supervision for a Better Data Annotator Xiaohuan Pei Yanxi Li Chang Xu 30 7 0 07 Jun 2023
Toward More Accurate and Generalizable Evaluation Metrics for Task-Oriented Dialogs A. Komma Nagesh Panyam Chandrasekarasastry Timothy Leffel Anuj Kumar Goyal A. Metallinou Spyros Matsoukas Aram Galstyan 33 3 0 06 Jun 2023
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues Yue Feng Yunlong Jiao Animesh Prasad Nikolaos Aletras Emine Yilmaz G. Kazai 24 5 0 26 May 2023
Psychological Metrics for Dialog System Evaluation Salvatore Giorgi Shreya Havaldar Farhan S. Ahmed Zuhaib Akhtar Shalaka Vaidya Gary Pan Pallavi V. Kulkarni H. A. Schwartz Joao Sedoc 22 2 0 24 May 2023
Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process Fanghua Ye Zhiyuan Hu Emine Yilmaz 26 6 0 21 May 2023
Are LLMs All You Need for Task-Oriented Dialogue? Vojtvech Hudevcek Ondrej Dusek 26 57 0 13 Apr 2023
Evaluating Human-Language Model Interaction Mina Lee Megha Srivastava Amelia Hardy John Thickstun Esin Durmus ... Hancheng Cao Tony Lee Rishi Bommasani Michael S. Bernstein Percy Liang LM&MA ALM 58 99 0 19 Dec 2022
On the Effectiveness of Automated Metrics for Text Generation Systems Pius von Daniken Jan Deriu Don Tuggener Mark Cieliebak 21 3 0 24 Oct 2022
Are Current Task-oriented Dialogue Systems Able to Satisfy Impolite Users? Zhiqiang Hu Roy Ka-Wei Lee Nancy F. Chen 32 4 0 24 Oct 2022
Evaluating Agent Interactions Through Episodic Knowledge Graphs Selene Báez Santamaría Piek Vossen T. Baier 28 2 0 22 Sep 2022
Semantic-based Pre-training for Dialogue Understanding Xuefeng Bai Linfeng Song Yue Zhang 38 7 0 19 Sep 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation Longxuan Ma Ziyu Zhuang Weinan Zhang Mingda Li Ting Liu 29 4 0 17 Aug 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning Charles Burton Snell Ilya Kostrikov Yi Su Mengjiao Yang Sergey Levine OffRL 133 102 0 05 Jun 2022
DFM: Dialogue Foundation Model for Universal Large-Scale Dialogue-Oriented Task Learning Zhi Chen Jijia Bao Lu Chen Yuncong Liu Da Ma ... Xinhsuai Dong Fujiang Ge Qingliang Miao Jian-Guang Lou Kai Yu ALM AI4CE 45 3 0 25 May 2022
A Chit-Chats Enhanced Task-Oriented Dialogue Corpora for Fuse-Motive Conversation Systems Changhong Yu Chunhong Zhang Qibo Sun 31 1 0 12 May 2022
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems Weiwei Sun Shuyu Guo Shuo Zhang Pengjie Ren Zhumin Chen Maarten de Rijke Z. Ren ELM 25 5 0 02 Apr 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation Sarik Ghazarian Behnam Hedayatnia Alexandros Papangelis Yang Liu Dilek Z. Hakkani-Tür 30 19 0 25 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 30 21 0 18 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems Jan Deriu Don Tuggener Pius von Daniken Mark Cieliebak AAML 19 9 0 28 Feb 2022
A Systematic Literature Review on Persuasive Technology at the Workplace Kilian Wenker 19 14 0 02 Jan 2022
A Survey of Natural Language Generation Chenhe Dong Hai-Tao Zheng Haifan Gong Mengzhao Chen Junxin Li Ying Shen Min Yang 3DV 27 43 0 22 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 22 36 0 03 Nov 2021
Every time I fire a conversational designer, the performance of the dialog system goes down Giancarlo A. Xompero Michele Mastromattei Samir Salman Cristina Giannone Andrea Favalli Raniero Romagnoli Fabio Massimo Zanzotto 24 0 0 27 Sep 2021
Actionable Conversational Quality Indicators for Improving Task-Oriented Dialog Systems Michael Higgins Dominic Widdows Chris Brew Gwen Christian Andrew Maurer ... Akshay Hazare George Bonev Beth Ann Hockey Kristen Howell Joe Bradley 14 0 0 22 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics Ananya B. Sai Tanay Dixit D. Y. Sheth S. Mohan Mitesh M. Khapra AAML 116 57 0 13 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches Xinmeng Li Wansen Wu Long Qin Quanjun Yin ELM 30 8 0 03 Aug 2021
A Comprehensive Assessment of Dialog Evaluation Metrics Yi-Ting Yeh M. Eskénazi Shikib Mehri 36 104 0 07 Jun 2021
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing Yu Li Josh Arnold Feifan Yan Weiyan Shi Zhou Yu ELM 31 11 0 05 May 2021
Meta-evaluation of Conversational Search Evaluation Metrics Zeyang Liu K. Zhou Max L. Wilson ELM 32 17 0 27 Apr 2021
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems E. Razumovskaia Goran Glavavs Olga Majewska E. Ponti Anna Korhonen Ivan Vulić 23 32 0 17 Apr 2021
Advances and Challenges in Conversational Recommender Systems: A Survey Chongming Gao Wenqiang Lei Xiangnan He Maarten de Rijke Tat-Seng Chua 138 273 0 23 Jan 2021
Joint Turn and Dialogue level User Satisfaction Estimation on Multi-Domain Conversations Praveen Kumar Bodigutla Aditya Tiwari Josep Valls-Vargas L. Polymenakos Spyros Matsoukas 16 32 0 06 Oct 2020
Evaluation of Text Generation: A Survey Asli Celikyilmaz Elizabeth Clark Jianfeng Gao ELM LM&MA 19 376 0 26 Jun 2020
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols Sarah E. Finch Jinho Choi ELM 29 67 0 10 Jun 2020
Report from the NSF Future Directions Workshop, Toward User-Oriented Agents: Research Directions and Challenges M. Eskénazi Tiancheng Zhao LLMAG AI4TS AI4CE 36 9 0 10 Jun 2020
A Survey of Document Grounded Dialogue Systems (DGDS) Longxuan Ma Weinan Zhang Mingda Li Ting Liu 27 19 0 17 Apr 2020
Multi-domain Conversation Quality Evaluation via User Satisfaction Estimation Praveen Kumar Bodigutla L. Polymenakos Spyros Matsoukas 13 21 0 18 Nov 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 299 6,984 0 20 Apr 2018
Adversarial Evaluation of Dialogue Models Anjuli Kannan Oriol Vinyals AAML ALM 141 76 0 27 Jan 2017