ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23474
  4. Cited By
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns

Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns

29 May 2025
Xiang Li
Haiyang Yu
Xinghua Zhang
Ziyang Huang
Shizhu He
Kang Liu
Jun Zhao
Fei Huang
Yongbin Li
    LRM
ArXiv (abs)PDFHTML

Papers citing "Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns"

19 / 19 papers shown
Title
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Yingwei Ma
Binhua Li
Yihong Dong
Xue Jiang
Rongyu Cao
Jingshu Chen
Fei Huang
Yongqian Li
LLMAGLRM
126
7
0
31 Mar 2025
Process Reward Models for LLM Agents: Practical Framework and Directions
Process Reward Models for LLM Agents: Practical Framework and Directions
Sanjiban Choudhury
72
11
0
17 Feb 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Mingyang Song
Zhaochen Su
Xiaoye Qu
Jiawei Zhou
Yu Cheng
LRM
136
40
0
06 Jan 2025
ProcessBench: Identifying Process Errors in Mathematical Reasoning
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Chujie Zheng
Zizhuo Zhang
Beichen Zhang
Runji Lin
Keming Lu
Bowen Yu
Dayiheng Liu
Jingren Zhou
Junyang Lin
LRM
201
77
0
09 Dec 2024
MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for
  Strengthening LLM
MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM
Changcheng Li
Xiangyu Wang
Qiuju Chen
Xiren Zhou
Huanhuan Chen
LRM
68
2
0
05 Dec 2024
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety
  and Style
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu
Zijun Yao
Rui Min
Yixin Cao
Lei Hou
Juanzi Li
OffRLALM
108
42
0
21 Oct 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via
  Self-Improvement
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
...
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLMLRM
125
320
0
18 Sep 2024
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
  Sampling
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal
Arian Hosseini
Rishabh Agarwal
Vinh Q. Tran
Mehran Kazemi
SyDaOffRLLRM
102
49
0
29 Aug 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical
  Reasoning with Checklist
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELMLRM
104
31
0
11 Jul 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs
  with a Hierarchical Mathematics Benchmark
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Hongwei Liu
Zilong Zheng
Yuxuan Qiao
Haodong Duan
Zhiwei Fei
Fengzhe Zhou
Wenwei Zhang
Songyang Zhang
Dahua Lin
Kai-xiang Chen
114
68
0
20 May 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language
  Models -- A Survey
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELMLRMLM&MA
158
51
0
02 Apr 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
84
56
0
22 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
132
282
0
21 Feb 2024
ReFT: Reasoning with Reinforced Fine-Tuning
ReFT: Reasoning with Reinforced Fine-Tuning
Trung Quoc Luong
Xinbo Zhang
Zhanming Jie
Peng Sun
Xiaoran Jin
Hang Li
OffRLLRMReLM
100
131
0
17 Jan 2024
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human
  Annotations
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMatLRMALM
155
398
0
14 Dec 2023
Large Language Model for Science: A Study on P vs. NP
Large Language Model for Science: A Study on P vs. NP
Qingxiu Dong
Li Dong
Ke Xu
Guangyan Zhou
Y. Hao
Zhifang Sui
Furu Wei
LRM
40
17
0
11 Sep 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
198
1,240
0
31 May 2023
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
502
10,526
0
17 Jun 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
209
2,407
0
05 Mar 2021
1