ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
Heimdall: test-time scaling on the generative verification
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
139
7
0
14 Apr 2025
Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors
Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors
Francesco Marchiori
Denis Donadel
Mauro Conti
78
0
0
14 Apr 2025
Training Small Reasoning LLMs with Cognitive Preference Alignment
Training Small Reasoning LLMs with Cognitive Preference Alignment
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
92
2
0
14 Apr 2025
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Thilo Hagendorff
Sarah Fabi
ReLMELMLRM
72
0
0
14 Apr 2025
Weight Ensembling Improves Reasoning in Language Models
Weight Ensembling Improves Reasoning in Language Models
Xingyu Dang
Christina Baek
Kaiyue Wen
Zico Kolter
Aditi Raghunathan
MoMeLRM
119
4
0
14 Apr 2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang
Wen-Ding Li
Daniele Paliotta
Daniel Ritter
Alexander M. Rush
Tri Dao
LRM
113
2
0
14 Apr 2025
PestMA: LLM-based Multi-Agent System for Informed Pest Management
PestMA: LLM-based Multi-Agent System for Informed Pest Management
Hongrui Shi
Shunbao Li
Zhipeng Yuan
Po Yang
LLMAG
83
0
0
14 Apr 2025
EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration
EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration
Soham Shah
Kumar Shridhar
Surojit Chatterjee
Souvik Sen
96
0
0
14 Apr 2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
Wei Wei
Jiayu Lin
Xinyi Mou
Shiyue Yang
Xiawei Liu
...
Jiebo Luo
Shiping Tang
Libo Wu
Baohua Zhou
Zhongyu Wei
LLMAG
174
6
0
14 Apr 2025
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
Yuanhang Zhang
Zihao Zeng
Dongbai Li
Yao Huang
Zhijie Deng
Yinpeng Dong
LRM
108
10
0
14 Apr 2025
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Haotian Wang
Han Zhao
Shuaiting Chen
Xiaoyu Tian
Sitong Zhao
Yunjie Ji
Yiping Peng
Xiangang Li
ReLMLRM
106
0
0
13 Apr 2025
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
Zhenting Wang
Guofeng Cui
Kun Wan
Wentian Zhao
Wentian Zhao
84
4
0
13 Apr 2025
ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model
ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model
Wuyang Lan
Wenzheng Wang
Changwei Ji
Guoxing Yang
Y. Zhang
Xiaohong Liu
Song Wu
Guangyu Wang
LM&MAELMLRMAI4MH
148
3
0
13 Apr 2025
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
LRM
143
16
0
13 Apr 2025
Towards Automated Formal Verification of Backend Systems with LLMs
Towards Automated Formal Verification of Backend Systems with LLMs
Kangping Xu
Yifan Luo
Yang Yuan
Andrew Chi-Chih Yao
10
0
0
13 Apr 2025
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang
Xiang Yue
Vipin Chaudhary
Xiaotian Han
ReLMLRM
129
11
0
12 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
Jian Wu
Hao Yang
Xinhua Zeng
Guibing He
Zhe Chen
Zhu Li
Xinming Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
397
1
0
12 Apr 2025
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Chengyu Wang
Taolin Zhang
Richang Hong
Jun Huang
ReLMLRM
111
2
0
12 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
102
1
0
11 Apr 2025
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Nicola Horst
Davide Mazzaccara
Antonia Schmidt
Michael Sullivan
Filippo Momentè
...
Alexander Koller
Oliver Lemon
David Schlangen
Mario Giulianelli
Alessandro Suglia
OffRL
124
0
0
11 Apr 2025
Large Language Models as Span Annotators
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
137
0
0
11 Apr 2025
SortBench: Benchmarking LLMs based on their ability to sort lists
SortBench: Benchmarking LLMs based on their ability to sort lists
Steffen Herbold
RALMLRM
62
0
0
11 Apr 2025
ML For Hardware Design Interpretability: Challenges and Opportunities
ML For Hardware Design Interpretability: Challenges and Opportunities
Raymond Baartmans
Andrew Ensinger
Victor Agostinelli
Lizhong Chen
87
0
0
11 Apr 2025
The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models
The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models
Michael J Bommarito II
Jillian Bommarito
Daniel Martin Katz
AILaw
126
1
0
10 Apr 2025
Automating quantum feature map design via large language models
Automating quantum feature map design via large language models
Kenya Sakka
K. Mitarai
Keisuke Fujii
98
2
0
10 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
Chao Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRLReLMSyDaLRMVLM
179
40
0
10 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Z. Huang
Zhe Chen
Zijia Zhao
Ziwei Chen
Zongyu Lin
MLLMVLMMoE
413
32
0
10 Apr 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Yukun Qi
Yiming Zhao
Y. Zeng
Xikun Bao
Wenjie Huang
Lin Yen-Chen
Zehui Chen
Jie Zhao
Zhongang Qi
Feng Zhao
LRM
123
4
0
10 Apr 2025
DeepGreen: Effective LLM-Driven Green-washing Monitoring System Designed for Empirical Testing -- Evidence from China
DeepGreen: Effective LLM-Driven Green-washing Monitoring System Designed for Empirical Testing -- Evidence from China
Congluo Xu
Yu Miao
Yiling Xiao
Chengmengjia Lin
63
0
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODDReLMLRMVLM
237
19
0
10 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLMLRM
388
20
0
10 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Yansen Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
117
2
0
10 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
Xianfeng Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLMVLMOffRLLRM
183
36
0
10 Apr 2025
Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games
Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games
Shouren Wang
Zehua Jiang
Fernando Sliva
Sam Earle
Julian Togelius
46
0
0
10 Apr 2025
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
118
4
0
10 Apr 2025
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Zican Dong
Han Peng
Peiyu Liu
Wayne Xin Zhao
Dong Wu
Feng Xiao
Ziyi Wang
MoE
88
2
0
09 Apr 2025
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models
Ling Team
Caizhi Tang
Chilin Fu
Chunwei Wu
Jia Guo
...
Shuaicheng Li
Yanzhe Zhang
Yingting Wu
Y. Liu
Zhenyu Huang
LRM
79
1
0
09 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Lu Dong
Xiangyu Zeng
Yinan He
Yun Wang
Yu Qiao
Yi Wang
Limin Wang
VLMAI4TSLRM
141
38
0
09 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLMALMLRM
242
26
0
09 Apr 2025
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking
Chang Nie
Yiqing Xu
Guangming Wang
Yanfeng Guo
Yanzi Miao
Hesheng Wang
VLM
89
1
0
09 Apr 2025
GAAPO: Genetic Algorithmic Applied to Prompt Optimization
GAAPO: Genetic Algorithmic Applied to Prompt Optimization
Xavier Sécheresse
Jacques-Yves Guilbert--Ly
Antoine Villedieu de Torcy
139
0
0
09 Apr 2025
OmniCaptioner: One Captioner to Rule Them All
OmniCaptioner: One Captioner to Rule Them All
Yiting Lu
Jiakang Yuan
Zhen Li
Jike Zhong
Qi Qin
...
Lei Bai
Zhibo Chen
Peng Gao
Bo Zhang
Peng Gao
MLLM
161
2
0
09 Apr 2025
Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
Umakanta Maharana
Sarthak Verma
Avarna Agarwal
Prakashini Mruthyunjaya
Dwarikanath Mahapatra
Sakir Ahmed
Murari Mandal
484
1
0
09 Apr 2025
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
X. Chen
Wei Li
Chunxu Liu
Chi Xie
Xiaoyan Hu
Chengqian Ma
Feng Zhu
Rui Zhao
ReLMLRM
161
2
0
08 Apr 2025
Agent Guide: A Simple Agent Behavioral Watermarking Framework
Agent Guide: A Simple Agent Behavioral Watermarking Framework
Kaibo Huang
Zhongliang Yang
Linna Zhou
136
0
0
08 Apr 2025
GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization
GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization
Bojana Ranković
P. Schwaller
BDL
495
1
0
08 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Ziyi Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
177
2
0
08 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLMLRM
199
19
0
08 Apr 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang
Wei Cheng
Sijin Chen
Xianfang Zeng
Jiaxu Zhang
Liao Wang
Gang Yu
Xingjun Ma
Xingjun Ma
Yu Jiang
VLM
148
6
0
08 Apr 2025
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
Gejian Zhao
Hanzhou Wu
Xinpeng Zhang
Athanasios V. Vasilakos
LRM
95
4
0
08 Apr 2025
Previous
123...171819...252627
Next