ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.16712
19
0

ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models

20 June 2025
Bin Chen
Xinzge Gao
Chuanrui Hu
Penghang Yu
Hua Zhang
Bing-Kun Bao
    ReLMLRM
ArXiv (abs)PDFHTML
Main:8 Pages
7 Figures
Bibliography:1 Pages
6 Tables
Appendix:6 Pages
Abstract

Generative Reward Models (GRMs) provide greater flexibility than scalar reward models in capturing human preferences, but their effectiveness is limited by poor reasoning capabilities. This often results in incomplete or overly speculative reasoning paths, leading to hallucinations or missing key information in complex tasks. We address this challenge with ReasonGRM, a three-stage generative reward modeling framework. In the first stage, Zero-RL is used to generate concise, outcome-directed reasoning paths that reduce the likelihood of critical omissions. In the second stage, we introduce a novel evaluation metric, R⋆R^\starR⋆, which scores reasoning paths based on their generation likelihood. This favors paths that reach correct answers with minimal exploration, helping to reduce hallucination-prone data during training. In the final stage, the model is further refined through reinforcement learning on challenging examples to enhance its preference discrimination capabilities. Experiments on three public benchmarks show that ReasonGRM achieves competitive or state-of-the-art performance, outperforming previous best GRMs by 1.8\% on average and surpassing proprietary models such as GPT-4o by up to 5.6\%. These results demonstrate the effectiveness of reasoning-aware training and highlight the importance of high-quality rationale selection for reliable preference modeling.

View on arXiv
@article{chen2025_2506.16712,
  title={ ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models },
  author={ Bin Chen and Xinzge Gao and Chuanrui Hu and Penghang Yu and Hua Zhang and Bing-Kun Bao },
  journal={arXiv preprint arXiv:2506.16712},
  year={ 2025 }
}
Comments on this paper