Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.04071
Cited By
Survey on Evaluation Methods for Dialogue Systems
10 May 2019
Jan Deriu
Álvaro Rodrigo
Arantxa Otegi
Guillermo Echegoyen
S. Rosset
Eneko Agirre
Mark Cieliebak
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Survey on Evaluation Methods for Dialogue Systems"
50 / 51 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
50
1
0
09 May 2025
Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems
Mikey Elmers
Koji Inoue
Divesh Lala
Keiko Ochi
Tatsuya Kawahara
23
1
0
04 Oct 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Lichao Zhang
Jia Yu
Shuai Zhang
Long Li
Yangyang Zhong
...
Fangsheng Weng
Fayu Pan
Jing Li
Renjun Xu
Zhenzhong Lan
32
4
0
21 Jun 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
43
3
0
15 Apr 2024
Apollonion: Profile-centric Dialog Agent
Shangyu Chen
Zibo Zhao
Yuanyuan Zhao
Xiang Li
LLMAG
40
1
0
10 Apr 2024
Token Trails: Navigating Contextual Depths in Conversational AI with ChatLLM
Md. Kowsher
Ritesh Panditi
Nusrat Jahan Prottasha
Prakash Bhat
A. Bairagi
M. Arefin
28
1
0
03 Apr 2024
Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues
Armand Stricker
P. Paroubek
41
3
0
23 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
65
29
0
02 Feb 2024
Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets
Armand Stricker
P. Paroubek
34
0
0
23 Nov 2023
Evaluating Chatbots to Promote Users' Trust -- Practices and Open Problems
Biplav Srivastava
Kausik Lakkaraju
T. Koppel
Vignesh Narayanan
Ashish Kundu
Sachindra Joshi
34
2
0
09 Sep 2023
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Qingyue Wang
Y. Fu
Yanan Cao
Zhiliang Tian
Shi Wang
Dacheng Tao
LLMAG
KELM
RALM
67
24
0
29 Aug 2023
GPT Self-Supervision for a Better Data Annotator
Xiaohuan Pei
Yanxi Li
Chang Xu
30
7
0
07 Jun 2023
Toward More Accurate and Generalizable Evaluation Metrics for Task-Oriented Dialogs
A. Komma
Nagesh Panyam Chandrasekarasastry
Timothy Leffel
Anuj Kumar Goyal
A. Metallinou
Spyros Matsoukas
Aram Galstyan
33
3
0
06 Jun 2023
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Yue Feng
Yunlong Jiao
Animesh Prasad
Nikolaos Aletras
Emine Yilmaz
G. Kazai
24
5
0
26 May 2023
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. A. Schwartz
Joao Sedoc
22
2
0
24 May 2023
Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process
Fanghua Ye
Zhiyuan Hu
Emine Yilmaz
26
6
0
21 May 2023
Are LLMs All You Need for Task-Oriented Dialogue?
Vojtvech Hudevcek
Ondrej Dusek
26
57
0
13 Apr 2023
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
58
99
0
19 Dec 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
21
3
0
24 Oct 2022
Are Current Task-oriented Dialogue Systems Able to Satisfy Impolite Users?
Zhiqiang Hu
Roy Ka-Wei Lee
Nancy F. Chen
32
4
0
24 Oct 2022
Evaluating Agent Interactions Through Episodic Knowledge Graphs
Selene Báez Santamaría
Piek Vossen
T. Baier
28
2
0
22 Sep 2022
Semantic-based Pre-training for Dialogue Understanding
Xuefeng Bai
Linfeng Song
Yue Zhang
38
7
0
19 Sep 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
Longxuan Ma
Ziyu Zhuang
Weinan Zhang
Mingda Li
Ting Liu
29
4
0
17 Aug 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
133
102
0
05 Jun 2022
DFM: Dialogue Foundation Model for Universal Large-Scale Dialogue-Oriented Task Learning
Zhi Chen
Jijia Bao
Lu Chen
Yuncong Liu
Da Ma
...
Xinhsuai Dong
Fujiang Ge
Qingliang Miao
Jian-Guang Lou
Kai Yu
ALM
AI4CE
45
3
0
25 May 2022
A Chit-Chats Enhanced Task-Oriented Dialogue Corpora for Fuse-Motive Conversation Systems
Changhong Yu
Chunhong Zhang
Qibo Sun
31
1
0
12 May 2022
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
Weiwei Sun
Shuyu Guo
Shuo Zhang
Pengjie Ren
Zhumin Chen
Maarten de Rijke
Z. Ren
ELM
25
5
0
02 Apr 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian
Behnam Hedayatnia
Alexandros Papangelis
Yang Liu
Dilek Z. Hakkani-Tür
30
19
0
25 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Jan Deriu
Don Tuggener
Pius von Daniken
Mark Cieliebak
AAML
19
9
0
28 Feb 2022
A Systematic Literature Review on Persuasive Technology at the Workplace
Kilian Wenker
19
14
0
02 Jan 2022
A Survey of Natural Language Generation
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
Mengzhao Chen
Junxin Li
Ying Shen
Min Yang
3DV
27
43
0
22 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
22
36
0
03 Nov 2021
Every time I fire a conversational designer, the performance of the dialog system goes down
Giancarlo A. Xompero
Michele Mastromattei
Samir Salman
Cristina Giannone
Andrea Favalli
Raniero Romagnoli
Fabio Massimo Zanzotto
24
0
0
27 Sep 2021
Actionable Conversational Quality Indicators for Improving Task-Oriented Dialog Systems
Michael Higgins
Dominic Widdows
Chris Brew
Gwen Christian
Andrew Maurer
...
Akshay Hazare
George Bonev
Beth Ann Hockey
Kristen Howell
Joe Bradley
14
0
0
22 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
116
57
0
13 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches
Xinmeng Li
Wansen Wu
Long Qin
Quanjun Yin
ELM
30
8
0
03 Aug 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
36
104
0
07 Jun 2021
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing
Yu Li
Josh Arnold
Feifan Yan
Weiyan Shi
Zhou Yu
ELM
31
11
0
05 May 2021
Meta-evaluation of Conversational Search Evaluation Metrics
Zeyang Liu
K. Zhou
Max L. Wilson
ELM
32
17
0
27 Apr 2021
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems
E. Razumovskaia
Goran Glavavs
Olga Majewska
E. Ponti
Anna Korhonen
Ivan Vulić
23
32
0
17 Apr 2021
Advances and Challenges in Conversational Recommender Systems: A Survey
Chongming Gao
Wenqiang Lei
Xiangnan He
Maarten de Rijke
Tat-Seng Chua
138
273
0
23 Jan 2021
Joint Turn and Dialogue level User Satisfaction Estimation on Multi-Domain Conversations
Praveen Kumar Bodigutla
Aditya Tiwari
Josep Valls-Vargas
L. Polymenakos
Spyros Matsoukas
16
32
0
06 Oct 2020
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols
Sarah E. Finch
Jinho Choi
ELM
29
67
0
10 Jun 2020
Report from the NSF Future Directions Workshop, Toward User-Oriented Agents: Research Directions and Challenges
M. Eskénazi
Tiancheng Zhao
LLMAG
AI4TS
AI4CE
36
9
0
10 Jun 2020
A Survey of Document Grounded Dialogue Systems (DGDS)
Longxuan Ma
Weinan Zhang
Mingda Li
Ting Liu
27
19
0
17 Apr 2020
Multi-domain Conversation Quality Evaluation via User Satisfaction Estimation
Praveen Kumar Bodigutla
L. Polymenakos
Spyros Matsoukas
13
21
0
18 Nov 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Adversarial Evaluation of Dialogue Models
Anjuli Kannan
Oriol Vinyals
AAML
ALM
141
76
0
27 Jan 2017
1
2
Next