Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16222
Cited By
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
22 May 2025
Jiwon Moon
Yerin Hwang
Dongryeol Lee
Taegwan Kang
Yongil Kim
Kyomin Jung
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation"
5 / 5 papers shown
Title
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAML
ELM
110
4
0
10 Apr 2025
LLMs can be easily Confused by Instructional Distractions
Yerin Hwang
Yongil Kim
Jahyun Koo
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
92
5
0
05 Feb 2025
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu
Yuxuan Lu
Grant Schoenebeck
Yuqing Kong
84
4
0
11 Nov 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
153
10
0
28 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
153
52
0
16 Oct 2024
1