Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.04723
Cited By
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
12 January 2022
Eric Michael Smith
Orion Hsu
Rebecca Qian
Stephen Roller
Y-Lan Boureau
Jason Weston
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents"
17 / 17 papers shown
Title
Mirror: Multimodal Cognitive Reframing Therapy for Rolling with Resistance
Subin Kim
Hoonrae Kim
Jihyun Lee
Yejin Jeon
Gary Geunbae Lee
OffRL
29
0
0
16 Apr 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
1
0
03 Jan 2025
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
66
0
30 May 2024
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. A. Schwartz
Joao Sedoc
22
2
0
24 May 2023
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs
IokTong Lei
Zhidong Deng
ReLM
RALM
LRM
19
4
0
19 May 2023
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation
Gibbeum Lee
Volker Hartmann
Jongho Park
Dimitris Papailiopoulos
Kangwook Lee
24
62
0
08 May 2023
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems
Souvik Das
Sougata Saha
R. Srihari
HILM
15
30
0
11 Jan 2023
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
58
99
0
19 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
26
7
0
18 Dec 2022
Keep Me Updated! Memory Management in Long-term Conversations
Sanghwan Bae
Donghyun Kwak
Soyoung Kang
Min Young Lee
Sungdong Kim
Yuin Jeong
Hyeri Kim
Sang-Woo Lee
W. Park
Nako Sung
40
46
0
17 Oct 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua-Hong Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
Weiwei Sun
Shuyu Guo
Shuo Zhang
Pengjie Ren
Zhumin Chen
Maarten de Rijke
Z. Ren
ELM
25
5
0
02 Apr 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
A Survey of NLP-Related Crowdsourcing HITs: what works and what does not
Jessica Huynh
Jeffrey P. Bigham
M. Eskénazi
46
18
0
09 Nov 2021
Reason first, then respond: Modular Generation for Knowledge-infused Dialogue
Leonard Adolphs
Kurt Shuster
Jack Urbanek
Arthur Szlam
Jason Weston
KELM
LRM
204
41
0
09 Nov 2021
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
238
280
0
15 Jul 2021
An Evaluation Protocol for Generative Conversational Systems
Seolhwa Lee
Heuiseok Lim
Jo˜ao Sedoc
ELM
35
10
0
24 Oct 2020
1