Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.10012
Cited By
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
18 March 2022
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
Milica Gasic
Kallirroi Georgila
Dilek Z. Hakkani-Tür
Zekang Li
Verena Rieser
Samira Shaikh
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges"
50 / 63 papers shown
Title
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems
Jan Deriu
Don Tuggener
Pius von Daniken
Mark Cieliebak
AAML
26
9
0
28 Feb 2022
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
Eric Michael Smith
Orion Hsu
Rebecca Qian
Stephen Roller
Y-Lan Boureau
Jason Weston
55
67
0
12 Jan 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
42
18
0
14 Dec 2021
A Survey of NLP-Related Crowdsourcing HITs: what works and what does not
Jessica Huynh
Jeffrey P. Bigham
M. Eskénazi
72
18
0
09 Nov 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
Chen Zhang
João Sedoc
L. F. D’Haro
Rafael E. Banchs
Alexander I. Rudnicky
40
37
0
03 Nov 2021
Modeling Performance in Open-Domain Dialogue with PARADISE
M. Walker
Colin Harmon
James Graupera
Davan Harrison
S. Whittaker
44
7
0
21 Oct 2021
Better than Average: Paired Evaluation of NLP Systems
Maxime Peyrard
Wei Zhao
Steffen Eger
Robert West
ELM
53
24
0
20 Oct 2021
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
M. Fomicheva
Piyawat Lertvittayakumjorn
Wei Zhao
Steffen Eger
Yang Gao
ELM
46
40
0
08 Oct 2021
ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI
Amanda Cercas Curry
Gavin Abercrombie
Verena Rieser
48
79
0
20 Sep 2021
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling
Zeyang Liu
K. Zhou
Jiaxin Mao
Max L. Wilson
60
2
0
07 Sep 2021
Language Model Augmented Relevance Score
Ruibo Liu
Jason W. Wei
Soroush Vosoughi
32
10
0
19 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
49
105
0
07 Jul 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
83
405
0
30 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
59
107
0
07 Jun 2021
Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot Consistency
Zekang Li
Jinchao Zhang
Zhengcong Fei
Yang Feng
Jie Zhou
31
14
0
04 Jun 2021
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances
Zekang Li
Jinchao Zhang
Zhengcong Fei
Yang Feng
Jie Zhou
38
57
0
04 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation
Chen Zhang
Yiming Chen
L. F. D’Haro
Yan Zhang
Thomas Friedrichs
Grandee Lee
Haizhou Li
39
73
0
02 Jun 2021
HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations
Weixin Liang
Kai-Hui Liang
Zhou Yu
58
15
0
01 Jun 2021
Assessing Dialogue Systems with Distribution Distances
Jiannan Xiang
Yahui Liu
Deng Cai
Huayang Li
Defu Lian
Lemao Liu
24
18
0
06 May 2021
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
Markus Freitag
George F. Foster
David Grangier
Viresh Ratnakar
Qijun Tan
Wolfgang Macherey
137
382
0
29 Apr 2021
QuestEval: Summarization Asks for Fact-based Evaluation
Thomas Scialom
Paul-Alexis Dray
Patrick Gallinari
Sylvain Lamprier
Benjamin Piwowarski
Jacopo Staiano
Alex Jinpeng Wang
HILM
46
273
0
23 Mar 2021
Overview of the Ninth Dialog System Technology Challenge: DSTC9
Chulaka Gunasekara
Seokhwan Kim
L. F. D’Haro
Abhinav Rastogi
Yun-Nung Chen
...
A. Geramifard
Satwik Kottur
Seungwhan Moon
Shivani Poddar
R. Subba
88
75
0
12 Nov 2020
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy
Yang Zhao
Akiko Aizawa
26
55
0
01 Nov 2020
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang
Zheng Ye
Jinghui Qin
Liang Lin
Xiaodan Liang
36
103
0
08 Oct 2020
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
Shikib Mehri
Mihail Eric
Dilek Z. Hakkani-Tür
ELM
48
136
0
28 Sep 2020
Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining
Ananya B. Sai
Akash Kumar Mohankumar
Siddharth Arora
Mitesh M. Khapra
40
74
0
23 Sep 2020
Dialogue Response Ranking Training with Large-Scale Human Feedback Data
Xiang Gao
Yizhe Zhang
Michel Galley
Chris Brockett
Bill Dolan
ALM
54
105
0
15 Sep 2020
MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
Xiaoxue Zang
Abhinav Rastogi
Srinivas Sunkara
Raghav Gupta
Jianguo Zhang
Jindong Chen
65
276
0
10 Jul 2020
Unsupervised Evaluation of Interactive Dialog with DialoGPT
Shikib Mehri
M. Eskénazi
54
177
0
23 Jun 2020
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols
Sarah E. Finch
Jinho Choi
ELM
56
67
0
10 Jun 2020
Learning an Unreferenced Metric for Online Dialogue Evaluation
Koustuv Sinha
Prasanna Parthasarathi
Jasmine Wang
Ryan J. Lowe
William L. Hamilton
Joelle Pineau
OffRL
50
84
0
01 May 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
Shikib Mehri
M. Eskénazi
41
222
0
01 May 2020
Recipes for building an open-domain chatbot
Stephen Roller
Emily Dinan
Naman Goyal
Da Ju
Mary Williamson
...
Myle Ott
Kurt Shuster
Eric Michael Smith
Y-Lan Boureau
Jason Weston
ALM
113
1,001
0
28 Apr 2020
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
77
1,472
0
09 Apr 2020
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
77
931
0
27 Jan 2020
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems
Sarik Ghazarian
R. Weischedel
Aram Galstyan
Nanyun Peng
41
56
0
04 Nov 2019
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
Margaret Li
Jason Weston
Stephen Roller
56
176
0
06 Sep 2019
Neural Text Generation with Unlikelihood Training
Sean Welleck
Ilia Kulikov
Stephen Roller
Emily Dinan
Kyunghyun Cho
Jason Weston
MU
37
570
0
12 Aug 2019
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
Prakhar Gupta
Shikib Mehri
Tiancheng Zhao
Amy Pavel
M. Eskénazi
Jeffrey P. Bigham
58
86
0
24 Jul 2019
Incremental Transformer with Deliberation Decoder for Document Grounded Conversations
Zekang Li
Cheng Niu
Fandong Meng
Yang Feng
Q. Li
Jie Zhou
58
115
0
20 Jul 2019
Survey on Evaluation Methods for Dialogue Systems
Jan Deriu
Álvaro Rodrigo
Arantxa Otegi
Guillermo Echegoyen
S. Rosset
Eneko Agirre
Mark Cieliebak
50
280
0
10 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
205
2,296
0
02 May 2019
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings
Sarik Ghazarian
Johnny Tian-Zheng Wei
Aram Galstyan
Nanyun Peng
39
90
0
24 Apr 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
243
5,668
0
21 Apr 2019
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses
Ananya B. Sai
Mithun Das Gupta
Mitesh M. Khapra
Mukundhan Srinivasan
47
48
0
23 Feb 2019
The Second Conversational Intelligence Challenge (ConvAI2)
Emily Dinan
V. Logacheva
Valentin Malykh
Alexander H. Miller
Kurt Shuster
...
Alexander I. Rudnicky
Jason Williams
Joelle Pineau
Andrey Kravchenko
Jason Weston
DRL
94
363
0
31 Jan 2019
Beyond Turing: Intelligent Agents Centered on the User
M. Eskénazi
Shikib Mehri
E. Razumovskaia
Tiancheng Zhao
LLMAG
48
12
0
20 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.2K
93,936
0
11 Oct 2018
MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Paweł Budzianowski
Tsung-Hsien Wen
Bo-Hsiang Tseng
I. Casanueva
Stefan Ultes
Osman Ramadan
Milica Gasic
144
1,306
0
29 Sep 2018
Retrieval-Enhanced Adversarial Training for Neural Response Generation
Shuai Yang
Jiaying Liu
Wenjing Wang
Furu Wei
Zongming Guo
RALM
33
84
0
12 Sep 2018
1
2
Next