Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.01724
Cited By
Large Language Models are Inconsistent and Biased Evaluators
2 May 2024
Rickard Stureborg
Dimitris Alikaniotis
Yoshi Suhara
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models are Inconsistent and Biased Evaluators"
14 / 14 papers shown
Title
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
Soumik Dey
Hansi Wu
Binbin Li
45
0
0
07 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Hao Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
24
0
0
04 May 2025
Towards Automated Scoping of AI for Social Good Projects
Jacob Emmerson
Rayid Ghani
Zheyuan Ryan Shi
142
0
0
28 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
40
0
0
17 Apr 2025
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
127
9
0
05 Feb 2025
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
96
16
0
25 Nov 2024
FactLens: Benchmarking Fine-Grained Fact Verification
Kushan Mitra
Dan Zhang
Sajjadur Rahman
Estevam R. Hruschka
HILM
42
1
0
08 Nov 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
70
1
0
17 Oct 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALM
ELM
64
23
0
23 Aug 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
29
28
0
04 Jul 2024
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
46
9
0
02 Jun 2024
Tailoring Vaccine Messaging with Common-Ground Opinions
Rickard Stureborg
Sanxing Chen
Ruoyu Xie
Aayushi Patel
Christopher Li
Chloe Qinyu Zhu
Tingnan Hu
Jun Yang
Bhuwan Dhingra
42
0
0
17 May 2024
On the Limitations of Reference-Free Evaluations of Generated Text
Daniel Deutsch
Rotem Dror
Dan Roth
37
45
0
22 Oct 2022
1