Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation

22 May 2025

Papers citing "Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation"

5 / 5 papers shown

Title
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge Riccardo Cantini A. Orsino Massimo Ruggiero Domenico Talia AAML ELM 110 4 0 10 Apr 2025
LLMs can be easily Confused by Instructional Distractions Yerin Hwang Yongil Kim Jahyun Koo Taegwan Kang Hyunkyung Bae Kyomin Jung 92 5 0 05 Feb 2025
Benchmarking LLMs' Judgments with No Gold Standard Shengwei Xu Yuxuan Lu Grant Schoenebeck Yuqing Kong 84 4 0 11 Nov 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee Yerin Hwang Yongil Kim Joonsuk Park Kyomin Jung ELM 153 10 0 28 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges Sijun Tan Siyuan Zhuang Kyle Montgomery William Y. Tang Alejandro Cuadron Chenguang Wang Raluca A. Popa Ion Stoica ELM ALM 153 52 0 16 Oct 2024