Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Z. Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Jianxin Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 788 papers shown
Title
Context-Free Synthetic Data Mitigates Forgetting
Parikshit Bansal
Sujay Sanghavi
CLL
25
0
0
20 May 2025
Can Large Language Models Really Recognize Your Name?
Dzung Pham
Peter Kairouz
Niloofar Mireshghallah
Eugene Bagdasarian
Chau Minh Pham
Amir Houmansadr
PILM
18
0
0
20 May 2025
RLVR-World: Training World Models with Reinforcement Learning
Jialong Wu
Shaofeng Yin
Ningya Feng
Mingsheng Long
OffRL
VGen
7
0
0
20 May 2025
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery
Kun Li
Zhennan Wu
Shoupeng Wang
Wenbin Hu
LLMAG
LM&MA
8
0
0
20 May 2025
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang
Xun Wu
Shaohan Huang
Qingxiu Dong
Zewen Chi
Li Dong
Xingxing Zhang
Tengchao Lv
Lei Cui
Furu Wei
OffRL
LRM
12
0
0
20 May 2025
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
Xiong Jun Wu
Zhenduo Zhang
Zujie Wen
Zhiqiang Zhang
Wang Ren
...
C. Chen
Deng Zhao
Dingnan Jin
Qing Cui
Jun Zhou
LRM
7
0
0
20 May 2025
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Siddhant Bhambri
Upasana Biswas
Subbarao Kambhampati
7
0
0
20 May 2025
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Minwu Kim
Anubhav Shrestha
Safal Shrestha
Aadim Nepal
Keith Ross
7
0
0
20 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
7
0
0
20 May 2025
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
23
0
0
20 May 2025
FLASH-D: FlashAttention with Hidden Softmax Division
K. Alexandridis
Vasileios Titopoulos
G. Dimitrakopoulos
7
0
0
20 May 2025
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
Jiaer Xia
Yuhang Zang
Peng Gao
Yixuan Li
Kaiyang Zhou
OffRL
ReLM
AI4TS
VLM
LRM
7
0
0
20 May 2025
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
Haijun Li
Tianqi Shi
Zifu Shang
Yuxuan Han
Xueyu Zhao
...
Longyue Wang
Gongbo Tang
Weihua Luo
Zhao Xu
Kaifu Zhang
ELM
12
0
0
20 May 2025
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma
Qian Liu
Dongfu Jiang
Ge Zhang
Z. Ma
Wenhu Chen
LRM
AI4CE
7
0
0
20 May 2025
Improved Methods for Model Pruning and Knowledge Distillation
Wei Jiang
Anying Fu
Youling Zhang
VLM
9
0
0
20 May 2025
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Bufang Yang
Lilin Xu
Liekang Zeng
Kaiwei Liu
Siyang Jiang
Wenrui Lu
Hongkai Chen
Xiaofan Jiang
Guoliang Xing
Zhenyu Yan
LLMAG
9
0
0
20 May 2025
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao
Yuchen Yan
Yongliang Shen
Haolei Xu
Wenqi Zhang
Kaitao Song
Jian Shao
Weiming Lu
Jun Xiao
Yueting Zhuang
LRM
9
0
0
20 May 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRM
LLMSV
12
0
0
20 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
26
0
0
19 May 2025
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Chenlin Ming
Chendi Qu
Mengzhang Cai
Qizhi Pei
Zhuoshi Pan
Yu Li
Xiaoming Duan
Lijun Wu
Zeang Sheng
12
0
0
19 May 2025
Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang
Jie Lu
En Yu
12
0
0
19 May 2025
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan YU
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
9
0
0
19 May 2025
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu
16
0
0
19 May 2025
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
Qiguang Chen
Libo Qin
Jinhao Liu
Yue Liao
Jiaqi Wang
Jingxuan Zhou
Wanxiang Che
LRM
12
0
0
19 May 2025
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL
Yaxun Dai
Wenxuan Xie
Xialie Zhuang
Tianyu Yang
Yiying Yang
Haiqin Yang
Yuhang Zhao
Pingfu Chao
Wenhao Jiang
ReLM
LRM
27
0
0
19 May 2025
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Haoyu Zhao
Yihan Geng
Shange Tang
Yong Lin
Bohan Lyu
Hongzhou Lin
Chi Jin
Sanjeev Arora
9
0
0
19 May 2025
Optimizing Retrieval Augmented Generation for Object Constraint Language
Kevin Chenhao Li
Vahid Zolfaghari
Nenad Petrovic
Fengjunjie Pan
Alois Knoll
2
0
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Y. Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
17
0
0
19 May 2025
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
Guoheng Sun
Ziyao Wang
Bowei Tian
Meng Liu
Zheyu Shen
Shwai He
Yexiao He
Wanghao Ye
Yiting Wang
Ang Li
LRM
7
0
0
19 May 2025
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
24
0
0
19 May 2025
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Zhongxiang Sun
Qipeng Wang
Haoyu Wang
Xiao Zhang
Jun Xu
HILM
LRM
9
0
0
19 May 2025
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
16
0
0
19 May 2025
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh
Harsh Chaudhari
Jaechul Roh
Mingshi Wu
Alina Oprea
Amir Houmansadr
AAML
ELM
12
0
0
19 May 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li
Chenxi Li
Tong Wu
Xuekai Zhu
Yuxuan Wang
...
Eric Hanchen Jiang
Song-Chun Zhu
Zixia Jia
Ying Nian Wu
Zilong Zheng
LRM
12
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
12
0
0
19 May 2025
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi
Zichen Liu
Tianyu Pang
Chao Du
W. Lee
Min Lin
OffRL
LRM
12
0
0
19 May 2025
Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
Yuhang Wang
Youhe Jiang
Bin Cui
Fangcheng Fu
LRM
7
0
0
19 May 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma
Yinghao Ma
Yanqiao Zhu
Chen Yang
Yi-Wen Chao
...
Wei Xue
Emmanouil Benetos
Kai Yu
Eng Siong Chng
Xie Chen
AuLLM
LRM
12
0
0
19 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELM
LRM
7
0
0
19 May 2025
Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
Jin Du
Li Chen
Xun Xian
An Luo
Fangqiao Tian
Ganghua Wang
Charles Doss
Xiaotong Shen
Jie Ding
CML
ELM
8
0
0
19 May 2025
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Soumya Rani Samineni
Durgesh Kalwar
Karthik Valmeekam
Kaya Stechly
Subbarao Kambhampati
OffRL
4
0
0
19 May 2025
Thinkless: LLM Learns When to Think
Gongfan Fang
Xinyin Ma
Xinchao Wang
LLMAG
OffRL
ReLM
LRM
6
0
0
19 May 2025
AdaptThink: Reasoning Models Can Learn When to Think
J. Zhang
Nianyi Lin
Lei Hou
Ling Feng
Juanzi Li
OffRL
LRM
2
0
0
19 May 2025
Fixed Point Explainability
Emanuele La Malfa
Jon Vadillo
Marco Molinari
Michael Wooldridge
7
0
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
7
0
0
18 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
LRM
4
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
12
0
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
4
0
0
18 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
9
0
0
18 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
17
0
0
18 May 2025
1
2
3
4
...
14
15
16
Next