ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22203
  4. Cited By
Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning

Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning

28 May 2025
Yuzhen Huang
Weihao Zeng
Xingshan Zeng
Qi Zhu
Junxian He
    LRM
ArXivPDFHTML

Papers citing "Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning"

7 / 7 papers shown
Title
General-Reasoner: Advancing LLM Reasoning Across All Domains
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma
Qian Liu
Dongfu Jiang
Ge Zhang
Zejun Ma
Wenhu Chen
AI4CE
LRM
37
4
0
20 May 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
109
20
0
24 Apr 2025
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Ding Chen
Qingchen Yu
P. Wang
Wentao Zhang
Simin Niu
Feiyu Xiong
Xiaochen Li
Minchuan Yang
Zhiyu Li
ALM
LRM
82
4
0
14 Apr 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
105
85
0
24 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
88
131
0
18 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
99
24
0
14 Mar 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
133
231
0
22 Jan 2025
1