Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.15407
Cited By
v1
v2 (latest)
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
24 December 2023
Chen Zhang
L. F. D’Haro
Yiming Chen
Malu Zhang
Haizhou Li
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (2★)
Papers citing
"A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators"
23 / 23 papers shown
Title
Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey
Katerina Korre
Dimitris Tsirmpas
Nikos Gkoumas
Emma Cabalé
Danai Myrtzani
Theodoros Evgeniou
Ion Androutsopoulos
Ion Androutsopoulos
83
2
0
03 Mar 2025
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
97
2
0
17 Oct 2024
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
81
2
0
20 Sep 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
174
40
0
02 Feb 2024
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Qingyue Wang
Y. Fu
Yanan Cao
Zhiliang Tian
Shi Wang
Dacheng Tao
LLMAG
KELM
RALM
123
29
0
29 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
391
4,388
0
09 Jun 2023
Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation
Jessica Huynh
Cathy Jiao
Prakhar Gupta
Shikib Mehri
Payal Bajaj
Vishrav Chaudhary
M. Eskénazi
ELM
LM&MA
52
17
0
27 Jan 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
208
1,634
0
15 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
394
2,392
0
09 Nov 2022
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
63
16
0
25 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
355
1,094
0
05 Oct 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
96
48
0
25 May 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
512
6,279
0
05 Apr 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Tianbo Ji
Yvette Graham
Gareth J. F. Jones
Chenyang Lyu
Qun Liu
ALM
54
39
0
11 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
13,148
0
04 Mar 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
59
19
0
14 Dec 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
211
3,778
0
03 Sep 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
66
109
0
07 Jun 2021
MIME: MIMicking Emotions for Empathetic Response Generation
Navonil Majumder
Pengfei Hong
Shanshan Peng
Jiankun Lu
Deepanway Ghosal
Alexander Gelbukh
Rada Mihalcea
Soujanya Poria
72
201
0
04 Oct 2020
Unsupervised Evaluation of Interactive Dialog with DialoGPT
Shikib Mehri
M. Eskénazi
59
178
0
23 Jun 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
Shikib Mehri
M. Eskénazi
67
225
0
01 May 2020
Designing Precise and Robust Dialogue Response Evaluators
Tianyu Zhao
Divesh Lala
Tatsuya Kawahara
44
53
0
10 Apr 2020
Personalizing Dialogue Agents: I have a dog, do you have pets too?
Saizheng Zhang
Emily Dinan
Jack Urbanek
Arthur Szlam
Douwe Kiela
Jason Weston
118
1,464
0
22 Jan 2018
1