ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.00050
67
8

JudgeLRM: Large Reasoning Models as a Judge

31 March 2025
Nuo Chen
Zhiyuan Hu
Qingyun Zou
Jiaying Wu
Qian Wang
Bryan Hooi
Bingsheng He
    ReLM
    ELM
    LRM
ArXivPDFHTML
Abstract

The rise of Large Language Models (LLMs) as evaluators offers a scalable alternative to human annotation, yet existing Supervised Fine-Tuning (SFT) for judges approaches often fall short in domains requiring complex reasoning. In this work, we investigate whether LLM judges truly benefit from enhanced reasoning capabilities. Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples - highlighting the limitations of SFT in such scenarios. To address this, we introduce JudgeLRM, a family of judgment-oriented LLMs trained using reinforcement learning (RL) with judge-wise, outcome-driven rewards. JudgeLRM models consistently outperform both SFT-tuned and state-of-the-art reasoning models. Notably, JudgeLRM-3B surpasses GPT-4, and JudgeLRM-7B outperforms DeepSeek-R1 by 2.79% in F1 score, particularly excelling in judge tasks requiring deep reasoning.

View on arXiv
@article{chen2025_2504.00050,
  title={ JudgeLRM: Large Reasoning Models as a Judge },
  author={ Nuo Chen and Zhiyuan Hu and Qingyun Zou and Jiaying Wu and Qian Wang and Bryan Hooi and Bingsheng He },
  journal={arXiv preprint arXiv:2504.00050},
  year={ 2025 }
}
Comments on this paper