Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.03300
Cited By
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
5 February 2024
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
Xiao Bi
Haowei Zhang
Mingchuan Zhang
Y. K. Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models"
50 / 203 papers shown
Title
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
65
0
0
27 Mar 2025
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
Zhengxi Lu
Yuxiang Chai
Yaxuan Guo
Xi Yin
Liang Liu
Hao Wang
Han Xiao
Shuai Ren
Guanjing Xiong
Yiming Li
LLMAG
LRM
80
11
0
27 Mar 2025
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
M. Ben-Chen
Tianpeng Li
Haoze Sun
Yijie Zhou
Chenzheng Zhu
...
Xin Wu
Haofen Wang
Jeff Z. Pan
Wen Zhang
H. Chen
ReLM
OffRL
AI4TS
LRM
72
10
0
25 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
91
38
0
24 Mar 2025
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
Alan Dao
Dinh Bach Vu
Bui Quang Huy
57
0
0
24 Mar 2025
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song
Mao Zheng
Zheng Li
Wenjie Yang
Xuan Luo
Yue Pan
Feng Zhang
ReLM
LRM
86
5
0
21 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
83
31
0
20 Mar 2025
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai
Shitian Zhao
Ming Li
Jike Zhong
Xiaofeng Yang
OffRL
LRM
LM&MA
VLM
81
9
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yuyao Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
Tieniu Tan
182
2
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
2
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jun Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
62
5
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
1
0
16 Mar 2025
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Gang Li
Jizhong Liu
Heinrich Dinkel
Yadong Niu
Junbo Zhang
Jian Luan
OffRL
LRM
ReLM
71
7
0
14 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan Ö. Arik
Dong Wang
Hamed Zamani
J. Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
84
29
0
12 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
59
45
0
09 Mar 2025
Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification
Youssef Mroueh
OffRL
41
5
0
09 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
109
2
0
07 Mar 2025
Multi-Fidelity Policy Gradient Algorithms
Xinjie Liu
Cyrus Neary
Kushagra Gupta
Christian Ellis
Ufuk Topcu
David Fridovich-Keil
OffRL
170
0
0
07 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
60
0
0
06 Mar 2025
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning
Chen Li
Yinyi Luo
Anudeep Bolimera
Uzair Ahmed
Shri Kiran Srinivasan
Hrishikesh Gokhale
Marios Savvides
LRM
AI4CE
65
1
0
06 Mar 2025
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang
Ziming Cheng
Junting Pan
Zhaohui Hou
Mingjie Zhan
LLMAG
101
2
0
05 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
93
31
0
03 Mar 2025
CE-U: Cross Entropy Unlearning
Bo Yang
MU
53
0
0
03 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
53
10
0
01 Mar 2025
Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat
Namgyu Ho
S. Kim
Yongjin Yang
Yujin Kim
Se-Young Yun
ReLM
LRM
64
12
0
27 Feb 2025
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Minggui He
Yilun Liu
Shimin Tao
Yuanchang Luo
Hongyong Zeng
...
Daimeng Wei
Weibin Meng
Hao Yang
Boxing Chen
Osamu Yoshie
LRM
71
3
0
27 Feb 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
73
14
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
What is the Alignment Objective of GRPO?
Milan Vojnovic
Se-Young Yun
70
2
0
25 Feb 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
75
3
0
24 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
154
1
0
22 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
214
1
0
21 Feb 2025
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation
Junchen Fu
Xuri Ge
Kaiwen Zheng
Ioannis Arapakis
Xin Xin
J. Jose
87
1
0
20 Feb 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C. Xie
Shuo Cai
Wenjun Wang
Pengxiang Li
Zhijie Sang
...
Xiaotian Han
Jianbo Yuan
Shengyu Zhang
Fei Wu
Hongxia Yang
LRM
51
1
0
17 Feb 2025
Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions
Lan Zhang
Marco Valentino
André Freitas
49
0
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
104
15
0
17 Feb 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Yansen Wang
Yichun Yin
Yijiao Wang
Lifeng Shang
Qiang Liu
LRM
75
2
0
17 Feb 2025
Small Models Struggle to Learn from Strong Reasoners
Yuetai Li
Xiang Yue
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Bill Yuchen Lin
Bhaskar Ramasubramanian
Radha Poovendran
LRM
46
12
0
17 Feb 2025
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Haoyang Li
Xuejia Chen
Zhanchao Xu
Darian Li
Nicole Hu
...
Heng Chang
Luyu Qiu
C. Zhang
Qing Li
Lei Chen
LRM
ELM
42
1
0
16 Feb 2025
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning
Qingwen Lin
Boyan Xu
Zijian Li
Zhifeng Hao
Keli Zhang
Ruichu Cai
LRM
52
2
0
16 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAML
LRM
102
9
0
10 Feb 2025
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
LRM
67
3
0
10 Feb 2025
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Yuhui Chen
Shuai Tian
Shugao Liu
Yingting Zhou
Haoran Li
Dongbin Zhao
OffRL
106
1
0
08 Feb 2025
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu
Kangjie Bao
Jiashuo Zhang
Yunqi Liu
Yu Chen
Yu Chen
Yang Jiao
Tao Luo
AIMat
55
0
0
08 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
61
2
0
07 Feb 2025
Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du
Yiming Yang
Sean Welleck
62
2
0
07 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
127
9
0
05 Feb 2025
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Oussama Zekri
Nicolas Boullé
DiffM
68
3
0
03 Feb 2025
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration
Minghang Deng
Ashwin Ramachandran
Canwen Xu
Lanxiang Hu
Zhewei Yao
Anupam Datta
Hao Zhang
LMTD
137
1
0
02 Feb 2025
Previous
1
2
3
4
5
Next