Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.16646
Cited By
Self-Generated Critiques Boost Reward Modeling for Language Models
25 November 2024
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
Richard Yuanzhe Pang
Yundi Qian
Xuewei Wang
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Generated Critiques Boost Reward Modeling for Language Models"
14 / 14 papers shown
Title
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zihan Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
22
0
0
16 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
19
0
0
15 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
183
1
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
43
0
0
05 May 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
91
0
0
27 Apr 2025
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
31
1
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
45
2
0
12 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Liu
Y. Wu
OffRL
LRM
46
13
0
03 Apr 2025
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
60
0
0
28 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
91
16
0
27 Mar 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Y. Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
62
4
0
26 Feb 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Zhenru Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
58
101
0
30 Dec 2024
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
Samiha Sharlin
Tyler R. Josephson
ReLM
LLMAG
LRM
47
1
0
22 Oct 2024
1