ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16222
  4. Cited By
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation

Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation

22 May 2025
Jiwon Moon
Yerin Hwang
Dongryeol Lee
Taegwan Kang
Yongil Kim
Kyomin Jung
    ELM
ArXiv (abs)PDFHTML

Papers citing "Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation"

5 / 5 papers shown
Title
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAMLELM
110
4
0
10 Apr 2025
LLMs can be easily Confused by Instructional Distractions
LLMs can be easily Confused by Instructional Distractions
Yerin Hwang
Yongil Kim
Jahyun Koo
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
92
5
0
05 Feb 2025
Benchmarking LLMs' Judgments with No Gold Standard
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu
Yuxuan Lu
Grant Schoenebeck
Yuqing Kong
84
4
0
11 Nov 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
153
10
0
28 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELMALM
153
52
0
16 Oct 2024
1