ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.16646
  4. Cited By
Self-Generated Critiques Boost Reward Modeling for Language Models

Self-Generated Critiques Boost Reward Modeling for Language Models

25 November 2024
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
Richard Yuanzhe Pang
Yundi Qian
Xuewei Wang
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
    LRM
    ALM
ArXivPDFHTML

Papers citing "Self-Generated Critiques Boost Reward Modeling for Language Models"

14 / 14 papers shown
Title
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zihan Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
22
0
0
16 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
19
0
0
15 May 2025
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
183
1
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
43
0
0
05 May 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
91
0
0
27 Apr 2025
Heimdall: test-time scaling on the generative verification
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
31
1
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
45
2
0
12 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Liu
Y. Wu
OffRL
LRM
46
13
0
03 Apr 2025
Probabilistic Uncertain Reward Model
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
60
0
0
28 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
91
16
0
27 Mar 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Y. Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
62
4
0
26 Feb 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Zhenru Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
58
101
0
30 Dec 2024
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
Samiha Sharlin
Tyler R. Josephson
ReLM
LLMAG
LRM
47
1
0
22 Oct 2024
1