Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17926
Cited By
v1
v2 (latest)
Large Language Models are not Fair Evaluators
29 May 2023
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (137★)
Papers citing
"Large Language Models are not Fair Evaluators"
33 / 133 papers shown
Title
Latent Concept-based Explanation of NLP Models
Xuemin Yu
Fahim Dalvi
Nadir Durrani
Marzia Nouri
Hassan Sajjad
LRM
FAtt
59
3
0
18 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
176
403
0
06 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
122
8
0
04 Apr 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
106
6
0
30 Mar 2024
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
Yu Li
Shenyu Zhang
Rui Wu
Xiutian Huang
Yongrui Chen
Wenhao Xu
Guilin Qi
Dehai Min
LLMAG
67
11
0
28 Mar 2024
Can multiple-choice questions really be useful in detecting the abilities of LLMs?
Wangyue Li
Liangzhi Li
Tong Xiang
Xiao Liu
Wei Deng
Noa Garcia
ELM
118
35
0
26 Mar 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
131
58
0
01 Mar 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
127
9
0
27 Feb 2024
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu
Minghao Wu
Alham Fikri Aji
ELM
66
14
0
21 Feb 2024
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation
Xunjian Yin
Xu Zhang
Jie Ruan
Xiaojun Wan
ELM
112
24
0
18 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
224
41
0
02 Feb 2024
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Xueyu Hu
Ziyu Zhao
Shuang Wei
Ziwei Chai
Qianli Ma
...
Jiwei Li
Kun Kuang
Yang Yang
Hongxia Yang
Leilei Gan
LMTD
ELM
96
58
0
10 Jan 2024
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
Fuheng Zhao
Lawrence Lim
Ishtiyaque Ahmad
D. Agrawal
A. El Abbadi
Amr El Abbadi
121
13
0
16 Dec 2023
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
ELM
69
15
0
12 Dec 2023
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Wenjie Mo
Lyne Tchapmi
Qin Liu
Jiong Wang
Jun Yan
Chaowei Xiao
Muhao Chen
Muhao Chen
AAML
148
20
0
16 Nov 2023
When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour
Leonardo Ranaldi
Giulia Pucci
72
34
0
15 Nov 2023
Instructive Dialogue Summarization with Query Aggregations
Bin Wang
Zhengyuan Liu
Nancy F. Chen
89
3
0
17 Oct 2023
On Context Utilization in Summarization with Large Language Models
Mathieu Ravaut
Aixin Sun
Nancy F. Chen
Shafiq Joty
93
14
0
16 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
104
9
0
10 Oct 2023
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Liang Chen
Yichi Zhang
Shuhuai Ren
Haozhe Zhao
Zefan Cai
Yuchi Wang
Peiyi Wang
Tianyu Liu
Baobao Chang
LM&Ro
LLMAG
182
44
0
03 Oct 2023
AutoAgents: A Framework for Automatic Agent Generation
Guangyao Chen
Siwei Dong
Yu Shu
Ge Zhang
Jaward Sesay
Börje F. Karlsson
Jie Fu
Yemin Shi
LLMAG
124
130
0
29 Sep 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
145
205
0
26 Sep 2023
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
117
33
0
23 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
139
78
0
21 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
124
81
0
13 Sep 2023
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
125
27
0
03 Sep 2023
Rational Decision-Making Agent with Internalized Utility Judgment
Yining Ye
Xin Cong
Shizuo Tian
Yujia Qin
Chong Liu
Y. Lin
Zhiyuan Liu
Maosong Sun
LLMAG
91
8
0
24 Aug 2023
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
LRM
OSLM
303
468
0
18 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
613
4,459
0
09 Jun 2023
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
119
82
0
23 May 2023
Lion: Adversarial Distillation of Proprietary Large Language Models
Yuxin Jiang
Chunkit Chan
Yin Hua
Wei Wang
ALM
108
25
0
22 May 2023
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MA
ALM
ELM
194
292
0
08 Feb 2023
Previous
1
2
3