Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.08146
Cited By
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
10 October 2024
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning"
50 / 59 papers shown
Title
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
Jaehoon Yun
Jiwoong Sohn
Jungwoo Park
Hyunjae Kim
Xiangru Tang
...
Minhyeok Ko
Qingyu Chen
Mark B. Gerstein
Michael Moor
Jaewoo Kang
LRM
LM&MA
24
0
0
13 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
86
0
0
11 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
35
0
0
10 Jun 2025
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Amrith Rajagopal Setlur
Matthew Y. R. Yang
Charlie Snell
Jeremy Greer
Ian Wu
Virginia Smith
Max Simchowitz
Aviral Kumar
LRM
49
0
0
10 Jun 2025
Intra-Trajectory Consistency for Reward Modeling
Chaoyang Zhou
Shunyu Liu
Zengmao Wang
Di Wang
Rong-Cheng Tu
Bo Du
Dacheng Tao
52
0
0
10 Jun 2025
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Junhong Shen
Hao Bai
Lunjun Zhang
Yifei Zhou
Amrith Rajagopal Setlur
...
Diego Caples
Nan Jiang
Tong Zhang
Ameet Talwalkar
Aviral Kumar
LLMAG
LRM
30
0
0
09 Jun 2025
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Lu Ma
Hao Liang
Meiyi Qiang
Lexiang Tang
Xiaochen Ma
...
Junbo Niu
Chengyu Shen
Runming He
Bin Cui
Wentao Zhang
ReLM
OffRL
LRM
26
0
0
09 Jun 2025
AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
Zixuan Jiang
Renjing Xu
25
0
0
08 Jun 2025
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
Chengwu Liu
Ye Yuan
Yichun Yin
Yan Xu
Xin Xu
Zaoyu Chen
Yasheng Wang
Lifeng Shang
Qun Liu
Ming Zhang
LRM
148
0
0
05 Jun 2025
Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models
Fei Ding
Baiqiao Wang
Zijian Zeng
Youwei Wang
LRM
94
0
0
05 Jun 2025
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang
Xinyu Zhu
Zilin Xiao
Minhao Jiang
Yu Meng
Jiawei Han
OffRL
ReLM
LRM
31
0
0
30 May 2025
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Xinglin Wang
Yiwei Li
Shaoxiong Feng
Peiwen Yuan
Y. Zhang
Jiayi Shi
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
LRM
27
0
0
30 May 2025
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo
Lijie Xu
Jie Liu
Dan Ye
Shuang Qiu
OffRL
91
0
0
29 May 2025
Discriminative Policy Optimization for Token-Level Reward Models
Hongzhan Chen
Tao Yang
Shiping Gao
Ruijun Chen
Xiaojun Quan
Hongtao Tian
Ting Yao
42
0
0
29 May 2025
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An
Ruochen Wang
Tianyi Zhou
Cho-Jui Hsieh
KELM
LRM
94
1
0
27 May 2025
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang
Jin Peng Zhou
Jonathan D. Chang
Zhaolin Gao
Nathan Kallus
Kianté Brantley
Wen Sun
LRM
92
1
0
23 May 2025
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
Woosung Koh
Wonbeen Oh
Jaein Jang
MinHyung Lee
Hyeongjin Kim
Ah Yeon Kim
Joonkee Kim
Junghyun Lee
Taehyeon Kim
Se-Young Yun
LRM
TTA
119
0
0
22 May 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Yuchen Yan
Jin Jiang
Zhenbang Ren
Yijun Li
Xudong Cai
...
Mengdi Zhang
Jian Shao
Yongliang Shen
Jun Xiao
Yueting Zhuang
OffRL
ALM
LRM
139
0
0
21 May 2025
Advancing LLM Safe Alignment with Safety Representation Ranking
Tianqi Du
Zeming Wei
Quan Chen
Chenheng Zhang
Yisen Wang
ALM
87
1
0
21 May 2025
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Hao Peng
167
8
0
21 May 2025
Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
Zhaohui Yang
Chenghua He
Xiaowen Shi
Linjing Li
Qiyue Yin
Shihong Deng
D. Jiang
LRM
41
0
0
20 May 2025
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
Karina Zainullina
Alexander Golubev
Maria Trofimova
Sergei Polezhaev
Ibragim Badertdinov
...
Filipp Fisin
Sergei Skvortsov
Maksim Nekrashevich
Anton Shevtsov
Boris Yangel
62
0
0
19 May 2025
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Hao Mark Chen
Guanxi Lu
Yasuyuki Okoshi
Zhiwen Mo
Masato Motomura
Hongxiang Fan
LRM
116
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
96
0
0
16 May 2025
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Bo Yue
Shuqi Guo
Kaiyu Hu
Chujiao Wang
Benyou Wang
Kui Jia
Guiliang Liu
LRM
111
0
0
16 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
393
21
0
05 May 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
Jianmin Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
462
5
0
21 Apr 2025
Weight Ensembling Improves Reasoning in Language Models
Xingyu Dang
Christina Baek
Kaiyue Wen
Zico Kolter
Aditi Raghunathan
MoMe
LRM
117
4
0
14 Apr 2025
Reasoning without Regret
Tarun Chitra
OffRL
LRM
83
0
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
134
11
0
12 Apr 2025
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
Zhuo Zhi
Qiangqiang Wu
Minghe shen
Wenbo Li
Yinchuan Li
Kun Shao
Kaiwen Zhou
LLMAG
180
3
0
06 Apr 2025
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Ram Ramrakhya
Matthew Chang
Xavier Puig
Ruta Desai
Z. Kira
Roozbeh Mottaghi
LLMAG
LM&Ro
121
1
0
01 Apr 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
166
1
0
30 Mar 2025
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Jiakai Tang
Sunhao Dai
Teng Shi
Jun Xu
X. Chen
Wen Chen
Wu Jian
Yuning Jiang
LRM
158
10
0
28 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
429
4
0
26 Mar 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
151
15
0
19 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
100
3
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
169
2
0
16 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
100
10
0
13 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
83
2
0
11 Mar 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Yuxiao Qu
Matthew Y. R. Yang
Amrith Rajagopal Setlur
Lewis Tunstall
E. Beeching
Ruslan Salakhutdinov
Aviral Kumar
OffRL
159
49
0
10 Mar 2025
Process-Supervised LLM Recommenders via Flow-guided Tuning
Chongming Gao
Mengyao Gao
Chenxiao Fan
Shuai Yuan
Wentao Shi
Xiangnan He
136
7
0
10 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
246
88
0
10 Mar 2025
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee
Ethan Che
Tianyi Peng
LRM
123
34
0
03 Mar 2025
Multi-Turn Code Generation Through Single-Step Rewards
A. Jain
Gonzalo Gonzalez-Pumariega
Wayne Chen
Alexander M. Rush
Wenting Zhao
Sanjiban Choudhury
LRM
88
3
0
27 Feb 2025
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Yudi Zhang
Lu Wang
Meng Fang
Yali Du
Chenghua Huang
...
Qingwei Lin
Mykola Pechenizkiy
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
ALM
143
2
0
26 Feb 2025
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction
Suozhi Huang
Peiyang Song
Robert Joseph George
Anima Anandkumar
AI4TS
LRM
96
2
0
25 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
414
3
0
21 Feb 2025
S
2
^2
2
R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
116
4
0
18 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Tengjiao Wang
Mengdi Wang
ReLM
LRM
AI4CE
187
18
0
10 Feb 2025
1
2
Next