Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.12122
Cited By
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
18 September 2024
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
Chengpeng Li
Dayiheng Liu
Jianhong Tu
Jingren Zhou
Junyang Lin
Keming Lu
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement"
50 / 127 papers shown
Title
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi
Fan Nie
Alexandre Alahi
James Zou
Himabindu Lakkaraju
Yilun Du
Eric P. Xing
Sham Kakade
Hanlin Zhang
23
1
0
19 Jun 2025
Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning
Sam Silver
Jimin Sun
Ivan Zhang
Sara Hooker
Eddie Kim
KELM
ReLM
LRM
20
0
0
18 Jun 2025
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria
Adinath Madhavrao Dukre
Feilong Tang
Sara Atito
Sudipta Roy
Muhammad Awais
Muhammad Haris Khan
Imran Razzak
VLM
40
0
0
18 Jun 2025
Spurious Rewards: Rethinking Training Signals in RLVR
Rulin Shao
Shuyue Stella Li
Rui Xin
Scott Geng
Yiping Wang
...
Ranjay Krishna
Yulia Tsvetkov
Hannaneh Hajishirzi
Pang Wei Koh
Luke Zettlemoyer
OffRL
ReLM
LRM
127
11
0
12 Jun 2025
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling
Liran Ringel
Elad Tolochinsky
Yaniv Romano
LRM
15
0
0
12 Jun 2025
CoRT: Code-integrated Reasoning within Thinking
Chengpeng Li
Zhengyang Tang
Ziniu Li
Mingfeng Xue
Keqin Bao
...
Ruoyu Sun
Benyou Wang
Xiang Wang
Junyang Lin
Dayiheng Liu
LLMAG
OffRL
ReLM
LRM
68
0
0
11 Jun 2025
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training
Shurui Gui
Shuiwang Ji
LRM
65
0
0
11 Jun 2025
Can A Gamer Train A Mathematical Reasoning Model?
Andrew Shin
ReLM
LRM
34
0
0
10 Jun 2025
Learning to Reason Across Parallel Samples for LLM Reasoning
Jianing Qi
Xi Ye
Hao Tang
Zhigang Zhu
Eunsol Choi
ReLM
LRM
17
0
0
10 Jun 2025
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Polina Kirichenko
Mark Ibrahim
Kamalika Chaudhuri
Samuel J. Bell
LRM
25
0
0
10 Jun 2025
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Lu Ma
Hao Liang
Meiyi Qiang
Lexiang Tang
Xiaochen Ma
...
Junbo Niu
Chengyu Shen
Runming He
Bin Cui
Wentao Zhang
ReLM
OffRL
LRM
22
0
0
09 Jun 2025
Mathesis: Towards Formal Theorem Proving from Natural Languages
Yu Xuejun
Jianyuan Zhong
Zijin Feng
Pengyi Zhai
Roozbeh Yousefzadeh
...
Dongcai Lu
Jiacheng Sun
Q. Xu
Shen Xin
Zhenguo Li
AIMat
OffRL
LRM
20
0
0
08 Jun 2025
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao
Tengyu Xu
Xuewei Wang
Zhengxing Chen
Di Jin
...
Yun He
Sinong Wang
Han Fang
Sarath Chandar
Chen Zhu
ReLM
LRM
KELM
17
0
0
07 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Haiyun Jiang
OffRL
LRM
25
0
0
07 Jun 2025
EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
Jinghan Jia
Hadi Reisizadeh
Chongyu Fan
Nathalie Baracaldo
Mingyi Hong
Sijia Liu
LRM
131
0
0
04 Jun 2025
Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning
Muling Wu
Qi Qian
Wenhao Liu
Xiaohua Wang
Z. Huang
...
Zhibo Xu
Lina Chen
Tianlong Li
Xiaoqing Zheng
Xuanjing Huang
LRM
94
0
0
04 Jun 2025
FreePRM: Training Process Reward Models Without Ground Truth Process Labels
Lin Sun
C. Liu
Xiaofeng Ma
Tao Yang
Weijia Lu
Ning Wu
63
0
0
04 Jun 2025
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
Zhuohan Xie
Dhruv Sahnan
Debopriyo Banerjee
Georgi Georgiev
Rushil Thareja
...
Ivan Koychev
Tanmoy Chakraborty
Salem Lahlou
Veselin Stoyanov
Preslav Nakov
ReLM
LRM
72
0
0
03 Jun 2025
BNPO: Beta Normalization Policy Optimization
Changyi Xiao
Mengdi Zhang
Yixin Cao
OffRL
56
0
0
03 Jun 2025
Incentivizing LLMs to Self-Verify Their Answers
Fuxiang Zhang
Jiacheng Xu
Chaojie Wang
Ce Cui
Yang Liu
Bo An
ReLM
LRM
54
0
0
02 Jun 2025
AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits
Yichen Shi
Ze Zhang
Hongyang Wang
Zhuofu Tao
Zhongyi Li
Bingyu Chen
Yaxin Wang
Zhiping Yu
Ting-Jung Lin
Lei He
23
0
0
30 May 2025
Towards Effective Code-Integrated Reasoning
Fei Bai
Yingqian Min
Beichen Zhang
Zhipeng Chen
Wayne Xin Zhao
Lei Fang
Zheng Liu
Zhongyuan Wang
Ji-Rong Wen
OffRL
LRM
20
0
0
30 May 2025
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu
Jiaxuan Gao
Xujie Shen
Chen Zhu
Zhiyu Mei
...
Jun Mei
Jiashu Wang
Tongkai Yang
Binhang Yuan
Yi Wu
OffRL
SyDa
LRM
57
0
0
30 May 2025
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns
Xiang Li
Haiyang Yu
Xinghua Zhang
Ziyang Huang
Shizhu He
Kang Liu
Jun Zhao
Fei Huang
Yongbin Li
LRM
32
0
0
29 May 2025
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao
Ran Cheng
Xingyu Wu
Jibin Wu
Kay Chen Tan
LRM
97
0
0
29 May 2025
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
Yunqiao Yang
Houxing Ren
Zimu Lu
Ke Wang
Weikang Shi
A-Long Zhou
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
52
0
0
29 May 2025
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Hao Li
He Cao
Bin Feng
Yanjun Shao
Xiangru Tang
Zhiyuan Yan
Li Yuan
Yonghong Tian
Yu-Feng Li
LRM
ELM
73
0
0
27 May 2025
Can Large Reasoning Models Self-Train?
Sheikh Shafayat
Fahim Tajwar
Ruslan Salakhutdinov
J. Schneider
Andrea Zanette
ReLM
OffRL
LRM
76
2
0
27 May 2025
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Jun Rao
Min Zhang
OffRL
LRM
80
0
0
26 May 2025
Adaptive Deep Reasoning: Triggering Deep Thinking When Needed
Yunhao Wang
Yuhao Zhang
T. Yu
Can Xu
Feng Zhang
Fengzong Lian
OffRL
LRM
32
0
0
26 May 2025
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
Tej Deep Pala
Panshul Sharma
Amir Zadeh
Chuan Li
Soujanya Poria
LRM
51
0
0
26 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
134
0
0
24 May 2025
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models
Hao Chen
Haoze Li
Zhiqing Xiao
Lirong Gao
Qi Zhang
Xiaomeng Hu
Ningtao Wang
Xing Fu
Junbo Zhao
202
0
0
24 May 2025
From Mathematical Reasoning to Code: Generalization of Process Reward Models in Test-Time Scaling
Zhengyu Chen
Yudong Wang
Teng Xiao
Ruochen Zhou
X. Yang
Wei Wang
Zhifang Sui
Jingang Wang
LRM
32
0
0
24 May 2025
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo
Ang Li
Yifei Wang
Stefanie Jegelka
Yisen Wang
OffRL
ReLM
LRM
99
0
0
24 May 2025
Steering LLM Reasoning Through Bias-Only Adaptation
Viacheslav Sinii
Alexey Gorbatovski
Artem Cherepanov
Boris Shaposhnikov
Nikita Balagansky
Daniil Gavrilov
LLMSV
LRM
23
0
0
24 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRL
LRM
249
3
0
23 May 2025
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Yutong Chen
Jiandong Gao
Ji Wu
ALM
207
0
0
23 May 2025
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Fanqi Wan
Weizhou Shen
Shengyi Liao
Yingcheng Shi
Chenliang Li
Ziyi Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
OffRL
LLMAG
ReLM
LRM
108
0
0
23 May 2025
Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning
Jun Rao
Xuebo Liu
Hexuan Deng
Zepeng Lin
Zixiong Yu
Jiansheng Wei
Xiaojun Meng
Min Zhang
LRM
213
0
0
22 May 2025
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
Shicheng Xu
Liang Pang
Yunchang Zhu
Jia Gu
Zihao Wei
Jingcheng Deng
Feiyang Pan
Huawei Shen
Xueqi Cheng
OffRL
LRM
104
0
0
22 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
98
0
0
22 May 2025
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
NovelSeek Team
Bo Zhang
Shiyang Feng
Xiangchao Yan
Jiakang Yuan
...
Zhongying Tu
Xiangyu Yue
W. Ouyang
Bowen Zhou
Lei Bai
LLMAG
110
2
0
22 May 2025
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
LRM
OffRL
143
3
0
21 May 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Yuchen Yan
Jin Jiang
Zhenbang Ren
Yijun Li
Xudong Cai
...
Mengdi Zhang
Jian Shao
Yongliang Shen
Jun Xiao
Yueting Zhuang
OffRL
ALM
LRM
130
0
0
21 May 2025
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Hao Peng
157
8
0
21 May 2025
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
Xiong Jun Wu
Zhenduo Zhang
ZuJie Wen
Zhiqiang Zhang
Wang Ren
...
Xudong Han
Chengfu Tang
Dingnan Jin
Qing Cui
Jun Zhou
LRM
215
1
0
20 May 2025
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma
Qian Liu
Dongfu Jiang
Ge Zhang
Zejun Ma
Wenhu Chen
AI4CE
LRM
106
6
0
20 May 2025
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang
Xun Wu
Shaohan Huang
Qingxiu Dong
Zewen Chi
Li Dong
Xingxing Zhang
Tengchao Lv
Lei Cui
Furu Wei
OffRL
LRM
155
5
0
20 May 2025
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu
Zhaoyi Yan
Yuanyi Wang
Yiming Zhang
Qi Zhou
Leilei Gan
Hongxia Yang
72
0
0
20 May 2025
1
2
3
Next