Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.04519
Cited By
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
8 January 2025
Xinyu Guan
Lefei Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking"
50 / 62 papers shown
Title
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
23
0
0
20 May 2025
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Yaorui Shi
Shihan Li
Chang Wu
Zhiyuan Liu
Junfeng Fang
Hengxing Cai
An Zhang
Xinbing Wang
ReLM
LRM
36
0
0
16 May 2025
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huatian Zhang
Dimitris N. Metaxas
Hao Wang
LRM
14
0
0
16 May 2025
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Xiwen Chen
Wenhui Zhu
Peijie Qiu
Xuanzhao Dong
Hao Wang
Haiyu Wu
Huayu Li
Aristeidis Sotiras
Yanjie Wang
Abolfazl Razi
ALM
42
0
0
14 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
43
0
0
12 May 2025
Chain-of-Thought Tokens are Computer Program Variables
Fangwei Zhu
Peiyi Wang
Zhifang Sui
LRM
44
0
0
08 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
76
2
0
05 May 2025
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
92
3
0
30 Apr 2025
IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery
Aniketh Garikaparthi
Manasi S. Patwardhan
L. Vig
Arman Cohan
VLM
LRM
69
0
0
23 Apr 2025
OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents
Raghav Thind
Youran Sun
Ling Liang
Haizhao Yang
LLMAG
36
0
0
23 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
51
1
0
22 Apr 2025
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
Zhaopeng Feng
Shaosheng Cao
Jiahan Ren
Jiayuan Su
Ruizhe Chen
Yan Zhang
Zhe Xu
Yao Hu
Jian Wu
Zuozhu Liu
ALM
LRM
63
3
0
14 Apr 2025
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
LRM
40
2
0
13 Apr 2025
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Chengyu Wang
Taolin Zhang
Richang Hong
Jun Huang
ReLM
LRM
45
1
0
12 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
159
2
0
10 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLM
LRM
87
5
0
08 Apr 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Wenbo Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
75
3
0
27 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
190
3
0
26 Mar 2025
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song
Mao Zheng
Zheng Li
Wenjie Yang
Xuan Luo
Yue Pan
Feng Zhang
ReLM
LRM
86
7
0
21 Mar 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Wu
Chenlin Ming
H. Vicky Zhao
Zeang Sheng
Lijun Wu
LRM
59
2
0
21 Mar 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
66
1
0
20 Mar 2025
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Gaole Dai
Shiqi Jiang
Ting Cao
Yuanchun Li
Yuqing Yang
Rui Tan
Mo Li
Lili Qiu
54
2
0
20 Mar 2025
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
Vaibhav Aggarwal
Ojasv Kamal
Abhinav Japesh
Zhijing Jin
Bernhard Schölkopf
52
1
0
18 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
63
1
0
18 Mar 2025
ϕ
ϕ
ϕ
-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
Fangzhi Xu
Hang Yan
Chang Ma
Haiteng Zhao
Jun Liu
Qika Lin
Zhiyong Wu
58
2
0
17 Mar 2025
ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs
Pengcheng Wen
Yalan Qin
Chi-Min Chan
Juntao Dai
Chongye Guo
Yaodong Yang
Sirui Han
Yike Guo
LLMAG
LRM
79
2
0
17 Mar 2025
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
Qin Liu
Wenxuan Zhou
Nan Xu
James Y. Huang
Fei Wang
Sheng Zhang
Hoifung Poon
Mengzhao Chen
LLMAG
ReLM
AI4Cl
LRM
100
1
0
17 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
58
3
0
17 Mar 2025
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
Zhaopeng Feng
Jiahan Ren
Jiayuan Su
Jiamei Zheng
Zhihang Tang
Hongwei Wang
Zuozhu Liu
LRM
65
1
0
15 Mar 2025
Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
Xinyu Tang
Xiaolei Wang
Zhihao Lv
Yingqian Min
Wayne Xin Zhao
Binbin Hu
Ziqi Liu
Qing Cui
LRM
84
4
0
14 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
48
5
0
13 Mar 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
Sara Rajaee
Kumar Pratik
Gabriele Cesa
Arash Behboodi
OffRL
LRM
61
0
0
12 Mar 2025
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
Minjun Zhu
Yixuan Weng
Linyi Yang
Yue Zhang
ALM
LRM
68
3
0
11 Mar 2025
HELM: Human-Preferred Exploration with Language Models
Shuhao Liao
Xuxin Lv
Yuhong Cao
Jeric Lew
Wenjun Wu
Guillaume Sartoretti
48
0
0
10 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Jiarui Yao
Ruida Wang
Tong Zhang
LRM
65
0
0
05 Mar 2025
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
Wei Sun
Qianlong Du
Fuwei Cui
Jiajun Zhang
OffRL
LRM
42
0
0
04 Mar 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jun Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
86
9
0
26 Feb 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei
LRM
53
24
0
25 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
44
6
0
24 Feb 2025
Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection
Han Zhang
Langshi Zhou
Hanfang Yang
LRM
RALM
ReLM
KELM
224
1
0
24 Feb 2025
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Alon Albalak
Duy Phung
Nathan Lile
Rafael Rafailov
Kanishk Gandhi
...
Anikait Singh
Chase Blagden
Violet Xiang
Dakota Mahan
Nick Haber
OffRL
LRM
53
7
0
24 Feb 2025
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Guijin Son
Jiwoo Hong
Hyunwoo Ko
James Thorne
LRM
53
8
0
24 Feb 2025
S
2
^2
2
R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
62
2
0
18 Feb 2025
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
60
7
0
17 Feb 2025
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Yuchen Yan
Yongliang Shen
Yang Liu
Jin Jiang
Xin Xu
Hao Fei
Jian Shao
Yueting Zhuang
ReLM
LRM
53
2
0
17 Feb 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Zhenfang Chen
Delin Chen
Rui Sun
Wenjun Liu
Chuang Gan
LLMAG
64
3
0
17 Feb 2025
Learning to Reason from Feedback at Test-Time
Yanyang Li
M. Lyu
Liwei Wang
LRM
36
1
0
16 Feb 2025
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Qiang Xu
Zhiyu Li
Zhijian Xu
Xiangyu Wen
Qiang Xu
LRM
38
3
0
16 Feb 2025
An Interpretable Automated Mechanism Design Framework with Large Language Models
Jiayuan Liu
Mingyu Guo
Vincent Conitzer
79
0
0
16 Feb 2025
1
2
Next