Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16265
Cited By
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
22 May 2025
Ilgee Hong
Changlong Yu
Liang Qiu
Weixiang Yan
Zhenghao Xu
Haoming Jiang
Qingru Zhang
Qin Lu
Xin Liu
Chao Zhang
Tuo Zhao
OffRL
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models"
3 / 3 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
384
2,022
0
22 Jan 2025
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
177
27
0
25 Nov 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
93
23
0
20 Sep 2024
1