Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.12122
Cited By
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
18 September 2024
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
Chengpeng Li
Dayiheng Liu
Jianhong Tu
Jingren Zhou
Junyang Lin
Keming Lu
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement"
50 / 127 papers shown
Title
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha
Minwu Kim
Aadim Nepal
Anubhav Shrestha
Keith Ross
OffRL
ReLM
LRM
79
0
0
19 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
179
0
0
18 May 2025
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
Junyu Lai
Jiakun Zhang
Shuo Xu
Taolue Chen
Zihang Wang
Yao Yang
Jiarui Zhang
Chun Cao
Jingwei Xu
113
0
0
17 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
163
0
0
16 May 2025
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Xiwen Chen
Wenhui Zhu
Peijie Qiu
Xuanzhao Dong
Hao Wang
Haiyu Wu
Huayu Li
Aristeidis Sotiras
Yanjie Wang
Abolfazl Razi
ALM
140
0
0
14 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
156
3
0
12 May 2025
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiasi Chen
Samet Oymak
ReLM
LRM
160
3
0
12 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
163
2
0
05 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
330
47
0
29 Apr 2025
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Yijiao Wang
Pei Zhang
Jialong Tang
Haoran Wei
Baosong Yang
...
Yanzhe Zhang
Fei Huang
Junyang Lin
Fei Huang
Jingren Zhou
LRM
140
4
0
25 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
132
9
0
22 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
Jianmin Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
460
5
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
167
17
0
21 Apr 2025
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
474
0
0
15 Apr 2025
Supervised Optimism Correction: Be Confident When LLMs Are Sure
Jing Zhang
Rushuai Yang
Shunyu Liu
Ting-En Lin
Fei Huang
Yi Chen
Yongqian Li
Dacheng Tao
OffRL
89
0
0
10 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
124
1
0
09 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLM
LRM
170
19
0
08 Apr 2025
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
Ximing Lu
Seungju Han
David Acuna
Hyunwoo Kim
Jaehun Jung
...
Niklas Muennighoff
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Yejin Choi
ReLM
LRM
110
7
0
06 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
279
0
0
03 Apr 2025
ToRL: Scaling Tool-Integrated RL
Xuefeng Li
Haoyang Zou
Pengfei Liu
OffRL
LRM
94
14
0
30 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
177
1
0
29 Mar 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
141
0
0
28 Mar 2025
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Weiqi Li
Xinyu Zhang
Shijie Zhao
Yize Zhang
Junlin Li
Li Zhang
Jian Zhang
107
11
0
28 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
200
137
0
24 Mar 2025
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
...
Zelin Peng
Junjun He
Junjun He
Zongyuan Ge
Imran Razzak
DiffM
VGen
296
2
0
20 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yize Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
439
4
0
18 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
157
6
0
17 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
169
2
0
16 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
...
Jun Wang
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
132
11
0
12 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRM
ReLM
142
10
0
09 Mar 2025
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li
Yinyi Luo
Anudeep Bolimera
Uzair Ahmed
Siyang Song
Hrishikesh Gokhale
Marios Savvides
LRM
AI4CE
123
1
0
06 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
121
0
0
06 Mar 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALM
OffRL
130
18
0
04 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Yiming Zhang
Zongzhang Zhang
Yang Yu
ALM
109
1
0
01 Mar 2025
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
LRM
85
0
0
27 Feb 2025
Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat
Namgyu Ho
S. Kim
Yongjin Yang
Yujin Kim
Se-Young Yun
ReLM
LRM
168
36
0
27 Feb 2025
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan
Jane Dwivedi-Yu
Song Jiang
Karthik Padthe
Yang Li
...
Ilia Kulikov
Kyunghyun Cho
Yuandong Tian
Jason Weston
Xian Li
ReLM
LRM
162
20
0
18 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq Joty
Furu Wei
LRM
210
16
0
17 Feb 2025
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
Jue Chen
Tianchu Yao
Chao Qu
Bin Li
Minghao Yang
...
Haozhe Wang
Xihe Qiu
Wei Chu
Yinghui Xu
Yuan Qi
OffRL
LRM
106
2
0
17 Feb 2025
Small Models Struggle to Learn from Strong Reasoners
Yuetai Li
Xiang Yue
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Bill Yuchen Lin
Bhaskar Ramasubramanian
Radha Poovendran
LRM
126
31
0
17 Feb 2025
Evaluating Step-by-step Reasoning Traces: A Survey
Jinu Lee
Julia Hockenmaier
LRM
ELM
153
2
0
17 Feb 2025
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Yuchen Yan
Yongliang Shen
Yang Liu
Jin Jiang
Xin Xu
Hao Fei
Jian Shao
Yueting Zhuang
ReLM
LRM
102
2
0
17 Feb 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Yongqian Li
W. Li
Zejun Ma
Chao Zhang
LRM
MLLM
VLM
126
5
0
17 Feb 2025
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang
Yuxuan Dong
Yongpeng Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
Lingling Zhang
Jun Liu
AIMat
ReLM
LRM
112
13
0
17 Feb 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Yansen Wang
Yichun Yin
Yijiao Wang
Lifeng Shang
Qiang Liu
LRM
164
3
0
17 Feb 2025
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Yunhua Zhou
Xipeng Qiu
LRM
178
20
0
17 Feb 2025
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Jianyuan Zhong
Zhiyu Li
Zhijian Xu
Xiangyu Wen
Qiang Xu
LRM
82
4
0
16 Feb 2025
CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model
Qingwen Lin
Boyan Xu
Zijian Li
Zijian Li
Keli Zhang
Ruichu Cai
Ruichu Cai
LRM
107
3
0
16 Feb 2025
Learning to Reason from Feedback at Test-Time
Yanyang Li
Michael R. Lyu
Liwei Wang
LRM
120
4
0
16 Feb 2025
RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
138
2
0
16 Feb 2025
Previous
1
2
3
Next