ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.04723
  4. Cited By
Human Evaluation of Conversations is an Open Problem: comparing the
  sensitivity of various methods for evaluating dialogue agents

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

12 January 2022
Eric Michael Smith
Orion Hsu
Rebecca Qian
Stephen Roller
Y-Lan Boureau
Jason Weston
ArXivPDFHTML

Papers citing "Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents"

17 / 17 papers shown
Title
Mirror: Multimodal Cognitive Reframing Therapy for Rolling with Resistance
Mirror: Multimodal Cognitive Reframing Therapy for Rolling with Resistance
Subin Kim
Hoonrae Kim
Jihyun Lee
Yejin Jeon
Gary Geunbae Lee
OffRL
29
0
0
16 Apr 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
1
0
03 Jan 2025
Hallucination-Free? Assessing the Reliability of Leading AI Legal
  Research Tools
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
66
0
30 May 2024
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. A. Schwartz
Joao Sedoc
22
2
0
24 May 2023
Hint of Thought prompting: an explainable and zero-shot approach to
  reasoning tasks with LLMs
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs
IokTong Lei
Zhidong Deng
ReLM
RALM
LRM
19
4
0
19 May 2023
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation
Gibbeum Lee
Volker Hartmann
Jongho Park
Dimitris Papailiopoulos
Kangwook Lee
24
62
0
08 May 2023
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems
Diving Deep into Modes of Fact Hallucinations in Dialogue Systems
Souvik Das
Sougata Saha
R. Srihari
HILM
15
30
0
11 Jan 2023
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
58
99
0
19 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
26
7
0
18 Dec 2022
Keep Me Updated! Memory Management in Long-term Conversations
Keep Me Updated! Memory Management in Long-term Conversations
Sanghwan Bae
Donghyun Kwak
Soyoung Kang
Min Young Lee
Sungdong Kim
Yuin Jeong
Hyeri Kim
Sang-Woo Lee
W. Park
Nako Sung
40
46
0
17 Oct 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua-Hong Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Metaphorical User Simulators for Evaluating Task-oriented Dialogue
  Systems
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
Weiwei Sun
Shuyu Guo
Shuo Zhang
Pengjie Ren
Zhumin Chen
Maarten de Rijke
Z. Ren
ELM
25
5
0
02 Apr 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
30
21
0
18 Mar 2022
A Survey of NLP-Related Crowdsourcing HITs: what works and what does not
A Survey of NLP-Related Crowdsourcing HITs: what works and what does not
Jessica Huynh
Jeffrey P. Bigham
M. Eskénazi
46
18
0
09 Nov 2021
Reason first, then respond: Modular Generation for Knowledge-infused
  Dialogue
Reason first, then respond: Modular Generation for Knowledge-infused Dialogue
Leonard Adolphs
Kurt Shuster
Jack Urbanek
Arthur Szlam
Jason Weston
KELM
LRM
204
41
0
09 Nov 2021
Internet-Augmented Dialogue Generation
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
238
280
0
15 Jul 2021
An Evaluation Protocol for Generative Conversational Systems
An Evaluation Protocol for Generative Conversational Systems
Seolhwa Lee
Heuiseok Lim
Jo˜ao Sedoc
ELM
35
10
0
24 Oct 2020
1