ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
One RL to See Them All: Visual Triple Unified Reinforcement Learning
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRLLRM
139
0
0
23 May 2025
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
Che Liu
Haozhe Wang
J. Pan
Zhongwei Wan
Yong Dai
Fangzhen Lin
Wenjia Bai
Daniel Rueckert
Rossella Arcucci
OffRLLRMELM
118
1
0
23 May 2025
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi
Keith G. Mills
Baochun Li
Di Niu
LRM
87
0
0
23 May 2025
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Michael Hassid
Gabriel Synnaeve
Yossi Adi
Roy Schwartz
ReLMLRM
118
1
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
57
0
0
23 May 2025
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving
Zixian Guo
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Lei Zhang
W. Zuo
LRMVLM
111
0
0
23 May 2025
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang
Jichu Li
Yang Liu
Jinjie Wei
F. I. S. Kevin Zhou
Quyu Kong
MLLM
71
0
0
23 May 2025
Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
Yao Xu
Mingyu Xu
Fangyu Lei
Wangtao Sun
Xiangrong Zeng
Bingning Wang
Guang Liu
Shizhu He
Jun Zhao
Kang Liu
LRM
84
1
0
22 May 2025
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
Jipeng Zhang
Haolin Yang
Kehao Miao
Ruiyuan Zhang
Renjie Pi
Jiahui Gao
Xiaofang Zhou
194
0
0
22 May 2025
Training-Free Reasoning and Reflection in MLLMs
Training-Free Reasoning and Reflection in MLLMs
Hongchen Wei
Zhenzhong Chen
OffRLVLMLRM
115
0
0
22 May 2025
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
Xiaoqing Zhang
Huabin Zheng
Ang Lv
Yuhan Liu
Zirui Song
Flood Sung
Xiuying Chen
Rui Yan
OffRLReLMLRMAI4CE
128
0
0
22 May 2025
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Rui Ye
Xiangrui Liu
Qimin Wu
Xianghe Pang
Zhenfei Yin
Lei Bai
Siheng Chen
LLMAG
86
0
0
22 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
204
0
0
22 May 2025
ReCopilot: Reverse Engineering Copilot in Binary Analysis
ReCopilot: Reverse Engineering Copilot in Binary Analysis
Guoqiang Chen
Huiqi Sun
Daguang Liu
Zhiqi Wang
Qiang Wang
Bin Yin
Lu Liu
Lingyun Ying
45
0
0
22 May 2025
Constant Bit-size Transformers Are Turing Complete
Constant Bit-size Transformers Are Turing Complete
Qian Li
Yuyi Wang
LRM
25
0
0
22 May 2025
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Junhong Lin
Xinyue Zeng
Jie Zhu
Song Wang
Julian Shun
Jun Wu
Dawei Zhou
LRM
167
1
0
22 May 2025
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
Yifan Zhang
Xinkui Zhao
Zuxin Wang
Guanjie Cheng
Yueshen Xu
Shuiguang Deng
Yuxiang Cai
103
0
0
22 May 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao
Qixiang Yin
Jingyi Zhang
Min Yang
Yibo Wang
...
Fei Su
Li Shen
Minghui Qiu
Dacheng Tao
Jiaxing Huang
LRM
74
0
0
22 May 2025
A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization
A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization
Shengyu Feng
Weiwei Sun
Shanda Li
Ameet Talwalkar
Yiming Yang
89
1
0
22 May 2025
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
Shuhao Han
Haotian Fan
Fangyuan Kong
Wenjie Liao
Chunle Guo
...
Jian Guo
Zhizhuo Shao
Ziyu Feng
Bing Li
Weiming Hu
198
11
0
22 May 2025
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Beier Luo
Shuoyuan Wang
Yixuan Li
Jianguo Huang
73
0
0
22 May 2025
Optimal Policy Minimum Bayesian Risk
Optimal Policy Minimum Bayesian Risk
Ramón Fernandez Astudillo
Md Arafat Sultan
Aashka Trivedi
Yousef El-Kurdi
Tahira Naseem
Radu Florian
Salim Roukos
OffRL
61
0
0
22 May 2025
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Seamus Somerstep
Vinod Raman
Unique Subedi
Yuekai Sun
78
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLMLRM
154
0
0
22 May 2025
Foundation Models for Geospatial Reasoning: Assessing Capabilities of Large Language Models in Understanding Geometries and Topological Spatial Relations
Foundation Models for Geospatial Reasoning: Assessing Capabilities of Large Language Models in Understanding Geometries and Topological Spatial Relations
Yuhan Ji
Song Gao
Ying Nie
Ivan Majic
K. Janowicz
ReLMLRM
173
2
0
22 May 2025
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
Woosung Koh
Wonbeen Oh
Jaein Jang
MinHyung Lee
Hyeongjin Kim
Ah Yeon Kim
Joonkee Kim
Junghyun Lee
Taehyeon Kim
Se-Young Yun
LRMTTA
119
0
0
22 May 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Ilgee Hong
Changlong Yu
Liang Qiu
Weixiang Yan
Zhenghao Xu
...
Qingru Zhang
Qin Lu
Xin Liu
Chao Zhang
Tuo Zhao
OffRLReLMLRM
88
0
0
22 May 2025
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development
Yaxin Du
Yuzhu Cai
Yifan Zhou
Cheng-Yu Wang
Yu Qian
Xianghe Pang
Qian Liu
Yue Hu
Siheng Chen
70
0
0
22 May 2025
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques
Jianing Geng
Biao Yi
Zekun Fei
Tongxi Wu
Lihai Nie
Zheli Liu
AAML
52
0
0
22 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
116
0
0
22 May 2025
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang
Dian Li
Qiang Zhang
Chenjun
sinbadliu
Junxiong Lin
Jiahong Yan
Jiawei Liu
Zheng-Jun Zha
OffRL
53
0
0
22 May 2025
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang
Zirui Liu
Hongye Jin
Qingyu Yin
Vipin Chaudhary
Xiaotian Han
ReLMLRM
85
0
0
22 May 2025
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang
Yazhe Niu
120
0
0
22 May 2025
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Doohyuk Jang
Yoonjeon Kim
Chanjae Park
Hyun Ryu
Eunho Yang
LRM
105
0
0
22 May 2025
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan
Jiaze Li
Jianzhong Ju
Zhenbo Luo
Jian Luan
Ruihua Song
ReLMOffRLLRM
109
1
0
22 May 2025
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong
Ziyu Guo
Renrui Zhang
Wenyu Shan
Xinyu Wei
Zhenghao Xing
Hongsheng Li
Pheng-Ann Heng
EGVMOffRLLRM
125
1
0
22 May 2025
Latent Principle Discovery for Language Model Self-Improvement
Latent Principle Discovery for Language Model Self-Improvement
Keshav Ramji
Tahira Naseem
Ramón Fernandez Astudillo
LRM
113
0
0
22 May 2025
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation
Bowen Zheng
Xiaolei Wang
Enze Liu
Xi Wang
Lu Hongyu
Yu Chen
Wayne Xin Zhao
Ji-Rong Wen
136
0
0
22 May 2025
$\text{R}^2\text{ec}$: Towards Large Recommender Models with Reasoning
R2ec\text{R}^2\text{ec}R2ec: Towards Large Recommender Models with Reasoning
Runyang You
Chak Tou Leong
Xinyu Lin
Xin Zhang
Wenjie Wang
Wenjie Li
Liqiang Nie
LRM
97
0
0
22 May 2025
LLM Access Shield: Domain-Specific LLM Framework for Privacy Policy Compliance
LLM Access Shield: Domain-Specific LLM Framework for Privacy Policy Compliance
Yu Wang
Cailing Cai
Zhihua Xiao
Peifung E. Lam
68
0
0
22 May 2025
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
...
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
ELM
61
0
0
22 May 2025
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
Rui Ye
Keduan Huang
Qimin Wu
Yuzhu Cai
Tian Jin
...
Bo An
Yang Gao
Wenjun Wu
Lei Bai
Siheng Chen
LLMAG
141
1
0
22 May 2025
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
NovelSeek Team
Bo Zhang
Shiyang Feng
Xiangchao Yan
Jiakang Yuan
...
Zhongying Tu
Xiangyu Yue
W. Ouyang
Bowen Zhou
Lei Bai
LLMAG
128
2
0
22 May 2025
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
Hyang Cui
LRM
119
0
0
22 May 2025
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
Shicheng Xu
Liang Pang
Yunchang Zhu
Jia Gu
Zihao Wei
Jingcheng Deng
Feiyang Pan
Huawei Shen
Xueqi Cheng
OffRLLRM
126
0
0
22 May 2025
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
LRMOffRL
158
3
0
21 May 2025
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Jan Tönshoff
Martin Grohe
102
0
0
21 May 2025
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Zhexin Zhang
Yuhao Sun
Junxiao Yang
Shiyao Cui
Hongning Wang
Minlie Huang
AAML
106
0
0
21 May 2025
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang
Xuehai He
Weixiang Yan
Ao Shen
Chenyang Zhao
Shuaiqiang Wang
Yelong Shen
Xin Eric Wang
LRM
123
3
0
21 May 2025
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao
Xingyu Sui
Yulin Hu
Jiahe Guo
Haixiao Liu
Biye Li
Yanyan Zhao
Bing Qin
Ting Liu
OffRL
115
1
0
21 May 2025
Previous
123...91011...252627
Next