ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.16646
198
27
v1v2v3 (latest)

Self-Generated Critiques Boost Reward Modeling for Language Models

25 November 2024
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
Richard Yuanzhe Pang
Yundi Qian
Xuewei Wang
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
    LRMALM
ArXiv (abs)PDFHTML
Abstract

Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF). However, current reward models mainly produce scalar scores and struggle to incorporate critiques in a natural language format. We hypothesize that predicting both critiques and the scalar reward would improve reward modeling ability. Motivated by this, we propose Critic-RM, a framework that improves reward models using self-generated critiques without extra supervision. Critic-RM employs a two-stage process: generating and filtering high-quality critiques, followed by joint fine-tuning on reward prediction and critique generation. Experiments across benchmarks show that Critic-RM improves reward modeling accuracy by 3.7%-7.3% compared to standard reward models and LLM judges, demonstrating strong performance and data efficiency. Additional studies further validate the effectiveness of generated critiques in rectifying flawed reasoning steps with 2.5%-3.2% gains in improving reasoning accuracy.

View on arXiv
@article{yu2025_2411.16646,
  title={ Self-Generated Critiques Boost Reward Modeling for Language Models },
  author={ Yue Yu and Zhengxing Chen and Aston Zhang and Liang Tan and Chenguang Zhu and Richard Yuanzhe Pang and Yundi Qian and Xuewei Wang and Suchin Gururangan and Chao Zhang and Melanie Kambadur and Dhruv Mahajan and Rui Hou },
  journal={arXiv preprint arXiv:2411.16646},
  year={ 2025 }
}
Comments on this paper