ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16265
  4. Cited By
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

22 May 2025
Ilgee Hong
Changlong Yu
Liang Qiu
Weixiang Yan
Zhenghao Xu
Haoming Jiang
Qingru Zhang
Qin Lu
Xin Liu
Chao Zhang
Tuo Zhao
    OffRLReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models"

3 / 3 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
384
2,022
0
22 Jan 2025
Self-Generated Critiques Boost Reward Modeling for Language Models
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRMALM
177
27
0
25 Nov 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
93
23
0
20 Sep 2024
1