Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.02495
Cited By
Inference-Time Scaling for Generalist Reward Modeling
3 April 2025
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inference-Time Scaling for Generalist Reward Modeling"
50 / 55 papers shown
Title
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Ran Zhang
Mohannad Elhamod
47
0
0
29 May 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Zijun Liu
Zhennan Wan
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
LLMAG
60
0
0
27 May 2025
Flex-Judge: Think Once, Judge Anywhere
Jongwoo Ko
S. Kim
Sungwoo Cho
Se-Young Yun
ELM
LRM
186
0
0
24 May 2025
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios
Bin Xu
Yu Bai
Huashan Sun
Yiguan Lin
Siming Liu
Xinyue Liang
Yaolin Li
Yang Gao
Heyan Huang
AI4Ed
ELM
167
0
0
22 May 2025
Latent Principle Discovery for Language Model Self-Improvement
Keshav Ramji
Tahira Naseem
Ramón Fernandez Astudillo
LRM
83
0
0
22 May 2025
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
Woosung Koh
Wonbeen Oh
Jaein Jang
MinHyung Lee
Hyeongjin Kim
Ah Yeon Kim
Joonkee Kim
Junghyun Lee
Taehyeon Kim
Se-Young Yun
LRM
TTA
85
0
0
22 May 2025
SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation
Wenjie Yang
Mao Zheng
Mingyang Song
Zheng Li
OffRL
LRM
54
0
0
22 May 2025
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Hao Peng
96
6
0
21 May 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu
Siya Qi
Xinyu Wang
Chen Qian
Yali Du
Yulan He
OffRL
LRM
63
0
0
21 May 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Yuchen Yan
Jin Jiang
Zhenbang Ren
Yijun Li
Xudong Cai
...
Mengdi Zhang
Jian Shao
Yongliang Shen
Jun Xiao
Yueting Zhuang
OffRL
ALM
LRM
82
0
0
21 May 2025
Think-J: Learning to Think for Generative LLM-as-a-Judge
Hui Huang
Yancheng He
Hongli Zhou
Rui Zhang
Wei Liu
Weixun Wang
Wenbo Su
Bo Zheng
Jiaheng Liu
LLMAG
AILaw
ELM
LRM
56
1
0
20 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELM
LRM
74
0
0
19 May 2025
MR. Judge: Multimodal Reasoner as a Judge
Renjie Pi
Felix Bai
Qibin Chen
Simon Wang
Jiulong Shan
Kieran Liu
Meng Cao
ELM
LRM
80
0
0
19 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
59
0
0
18 May 2025
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier
Jianyuan Zhong
Zhiyu Li
Zhijian Xu
Xiangyu Wen
Kezhi Li
Jianyuan Zhong
LRM
50
0
0
17 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Ziyi Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
80
2
0
16 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
73
5
0
15 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
301
15
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Zheng Zhang
Yan Li
Liang Wang
OffRL
LRM
84
6
0
05 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
138
5
0
05 May 2025
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets
Mingqian He
Fei Zhao
Chonggang Lu
Ziqiang Liu
Yun Wang
Haofu Qian
OffRL
AI4TS
VLM
90
1
0
28 Apr 2025
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family
Pierre-Carl Langlais
Pavel Chizhov
Mattia Nee
Carlos Rosas Hinostroza
Matthieu Delsart
Irène Girard
Othman Hicheur
Anastasia Stasenko
Ivan P. Yamshchikov
LRM
84
0
0
25 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
69
3
0
04 Apr 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
138
30
0
14 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
108
8
0
27 Feb 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Yunjia Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
83
6
0
26 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
126
5
0
17 Feb 2025
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Mrinank Sharma
Meg Tong
Jesse Mu
Jerry Wei
Jorrit Kruthoff
...
Ruiqi Zhong
Giulio Zhou
Jan Leike
Jared Kaplan
Ethan Perez
184
28
0
31 Jan 2025
Atla Selene Mini: A General Purpose Evaluation Model
Andrei Alexandru
Antonia Calvi
Henry Broomfield
Jackson Golden
Kyle Dai
...
Max Bartolo
Roman Engeler
Sashank Pisupati
Toby Drane
Young Sun Park
ALM
ELM
AILaw
LM&MA
LRM
68
6
0
27 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
323
1,641
0
22 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
209
299
0
03 Jan 2025
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
137
24
0
25 Nov 2024
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Chris Yuhao Liu
Liang Zeng
Qingbin Liu
Rui Yan
Jujie He
Chaojie Wang
Shuicheng Yan
Yang Liu
Yahui Zhou
AI4TS
95
100
0
24 Oct 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Enyu Zhou
Guodong Zheng
Binghai Wang
Zhiheng Xi
Shihan Dou
...
Yurong Mou
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
101
19
0
13 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
99
11
0
03 Oct 2024
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang
Alexander Bukharin
Olivier Delalleau
Daniel Egert
Gerald Shen
Jiaqi Zeng
Oleksii Kuchaiev
Yi Dong
ALM
96
53
0
02 Oct 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
110
176
0
22 Jun 2024
Improving Reward Models with Synthetic Critiques
Zihuiwen Ye
Fraser Greenlee-Scott
Max Bartolo
Phil Blunsom
Jon Ander Campos
Matthias Gallé
ALM
SyDa
LRM
61
23
0
31 May 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMe
ALM
ELM
89
198
0
02 May 2024
MCRanker: Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers
Fang Guo
Wenyu Li
Honglei Zhuang
Yun Luo
Yafu Li
Qi Zhu
Le Yan
Yue Zhang
ALM
97
8
0
18 Apr 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie
Danyang Zhang
Jixuan Chen
Xiaochuan Li
Siheng Zhao
...
Shuyan Zhou
Silvio Savarese
Caiming Xiong
Victor Zhong
Tao Yu
81
161
0
11 Apr 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
136
252
0
20 Mar 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
306
327
0
18 Jan 2024
Tool-Augmented Reward Modeling
Lei Li
Yekun Chai
Shuohuan Wang
Yu Sun
Hao Tian
Ningyu Zhang
Hua Wu
OffRL
75
13
0
02 Oct 2023
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
93
129
0
14 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
316
4,288
0
09 Jun 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
146
1,140
0
31 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
76
330
0
04 May 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
171
1,611
0
15 Dec 2022
CodeT: Code Generation with Generated Tests
Bei Chen
Fengji Zhang
A. Nguyen
Daoguang Zan
Zeqi Lin
Jian-Guang Lou
Weizhu Chen
75
337
0
21 Jul 2022
1
2
Next