Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.23474
Cited By
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns
29 May 2025
Xiang Li
Haiyang Yu
Xinghua Zhang
Ziyang Huang
Shizhu He
Kang Liu
Jun Zhao
Fei Huang
Yongbin Li
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns"
19 / 19 papers shown
Title
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Yingwei Ma
Binhua Li
Yihong Dong
Xue Jiang
Rongyu Cao
Jingshu Chen
Fei Huang
Yongqian Li
LLMAG
LRM
124
7
0
31 Mar 2025
Process Reward Models for LLM Agents: Practical Framework and Directions
Sanjiban Choudhury
72
11
0
17 Feb 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Mingyang Song
Zhaochen Su
Xiaoye Qu
Jiawei Zhou
Yu Cheng
LRM
136
40
0
06 Jan 2025
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Chujie Zheng
Zizhuo Zhang
Beichen Zhang
Runji Lin
Keming Lu
Bowen Yu
Dayiheng Liu
Jingren Zhou
Junyang Lin
LRM
201
77
0
09 Dec 2024
MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM
Changcheng Li
Xiangyu Wang
Qiuju Chen
Xiren Zhou
Huanhuan Chen
LRM
68
2
0
05 Dec 2024
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu
Zijun Yao
Rui Min
Yixin Cao
Lei Hou
Juanzi Li
OffRL
ALM
108
42
0
21 Oct 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
...
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
125
320
0
18 Sep 2024
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal
Arian Hosseini
Rishabh Agarwal
Vinh Q. Tran
Mehran Kazemi
SyDa
OffRL
LRM
102
49
0
29 Aug 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
104
31
0
11 Jul 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Hongwei Liu
Zilong Zheng
Yuxuan Qiao
Haodong Duan
Zhiwei Fei
Fengzhe Zhou
Wenwei Zhang
Songyang Zhang
Dahua Lin
Kai-xiang Chen
114
68
0
20 May 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELM
LRM
LM&MA
156
51
0
02 Apr 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
84
56
0
22 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELM
AIMat
132
282
0
21 Feb 2024
ReFT: Reasoning with Reinforced Fine-Tuning
Trung Quoc Luong
Xinbo Zhang
Zhanming Jie
Peng Sun
Xiaoran Jin
Hang Li
OffRL
LRM
ReLM
100
131
0
17 Jan 2024
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMat
LRM
ALM
155
398
0
14 Dec 2023
Large Language Model for Science: A Study on P vs. NP
Qingxiu Dong
Li Dong
Ke Xu
Guangyan Zhou
Y. Hao
Zhifang Sui
Furu Wei
LRM
38
17
0
11 Sep 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
198
1,240
0
31 May 2023
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
502
10,526
0
17 Jun 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLM
FaML
209
2,407
0
05 Mar 2021
1