ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Yuqi Liu
Bohao Peng
Zhisheng Zhong
Zihao Yue
Fanbin Lu
Bei Yu
Jiaya Jia
LRMVLM
132
46
0
01 Jul 2025
RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought
RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought
Junbo Qiao
Miaomiao Cai
Wei Li
Y. Liu
X. Y. Huang
Gaoqi He
Jiao Xie
Jie Hu
X. Chen
Shaohui Lin
SupRVLMLRM
84
0
0
20 Jun 2025
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning
Yanzhi Zhang
Zhaoxi Zhang
Haoxiang Guan
Yilin Cheng
Yitong Duan
Chen Wang
Yue Wang
Shuxin Zheng
Jiyan He
ReLMLRM
66
0
0
20 Jun 2025
DistillNote: LLM-based clinical note summaries improve heart failure diagnosis
DistillNote: LLM-based clinical note summaries improve heart failure diagnosis
Heloisa Oss Boll
Antonio Oss Boll
Leticia Puttlitz Boll
Ameen Abu-Hanna
Iacer Calixto
30
0
0
20 Jun 2025
When Can Model-Free Reinforcement Learning be Enough for Thinking?
When Can Model-Free Reinforcement Learning be Enough for Thinking?
Josiah P. Hanna
Nicholas Corrado
OffRLLM&RoReLMLRMAI4CE
38
0
0
20 Jun 2025
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning
Zhangyang Qi
Zhixiong Zhang
Yizhou Yu
Jiaqi Wang
Hengshuang Zhao
LM&RoAI4TS
68
0
0
20 Jun 2025
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?
Adithya Bhaskar
Alexander Wettig
Tianyu Gao
Yihe Dong
Danqi Chen
27
0
0
20 Jun 2025
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Haoran Sun
Yankai Jiang
Wenjie Lou
Yujie Zhang
Wenjie Li
Lilong Wang
Mianxin Liu
Lei Liu
Xiaosong Wang
LRM
27
0
0
20 Jun 2025
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Xuechen Zhang
Zijian Huang
Yingcong Li
Chenshun Ni
Jiasi Chen
Samet Oymak
OffRLMoELRM
52
0
0
20 Jun 2025
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
Haoyue Zhang
Hualei Zhang
Xiaosong Ma
Jie Zhang
Song Guo
LRM
25
0
0
19 Jun 2025
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
Yi Chen
Yuying Ge
Rui Wang
Yixiao Ge
Junhao Cheng
Ying Shan
Xihui Liu
OffRLVLMLRM
40
0
0
19 Jun 2025
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Zhiyuan Liang
Dongwen Tang
Yuhao Zhou
Xuanlei Zhao
Mingjia Shi
...
Damian Borth
Michael M. Bronstein
Yang You
Zhangyang Wang
Kai Wang
OffRL
39
0
0
19 Jun 2025
ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning
ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning
Zexi Liu
Yuzhu Cai
Xinyu Zhu
Yujie Zheng
Runkun Chen
Ying Wen
Yanfeng Wang
Weinan E
Siheng Chen
LLMAGLRM
22
0
0
19 Jun 2025
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling
Fei Wang
Xingchen Wan
Ruoxi Sun
Jiefeng Chen
Sercan Ö. Arık
LRM
34
0
0
19 Jun 2025
OJBench: A Competition Level Code Benchmark For Large Language Models
OJBench: A Competition Level Code Benchmark For Large Language Models
Zhexu Wang
Y. Liu
Yejie Wang
Wenyang He
Bofei Gao
...
Kelin Fu
Flood Sung
Zhilin Yang
Tianyu Liu
Weiran Xu
ReLMLRMELM
33
0
0
19 Jun 2025
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
Khiem Le
Tuan V. Tran
Ting Hua
Nitesh Chawla
MoE
19
0
0
19 Jun 2025
When and How Unlabeled Data Provably Improve In-Context Learning
When and How Unlabeled Data Provably Improve In-Context Learning
Yingcong Li
Xiangyu Chang
Muti Kara
Xiaofeng Liu
Amit K. Roy-Chowdhury
Samet Oymak
26
0
0
18 Jun 2025
RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments
RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments
Yuchuan Fu
Xiaohan Yuan
Dongxia Wang
LLMAGELM
17
0
0
18 Jun 2025
Approximating Language Model Training Data from Weights
Approximating Language Model Training Data from Weights
John X. Morris
Junjie Oscar Yin
Woojeong Kim
Vitaly Shmatikov
Alexander M. Rush
49
0
0
18 Jun 2025
Lessons from Training Grounded LLMs with Verifiable Rewards
Lessons from Training Grounded LLMs with Verifiable Rewards
Shang Hong Sim
Tej Deep Pala
Vernon Y.H. Toh
Hai Leong Chieu
Amir Zadeh
Chuan Li
Navonil Majumder
Soujanya Poria
OffRLRALMLRM
28
0
0
18 Jun 2025
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Feng He
Zijun Chen
Xinnian Liang
Tingting Ma
Yunqi Qiu
Shuangzhi Wu
Junchi Yan
LRM
96
0
0
18 Jun 2025
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
32
0
0
18 Jun 2025
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
Zongxia Li
Yapei Chang
Yuhang Zhou
Xiyang Wu
Zichao Liang
Yoo Yeon Sung
Jordan L. Boyd-Graber
31
0
0
18 Jun 2025
Reward Models in Deep Reinforcement Learning: A Survey
Reward Models in Deep Reinforcement Learning: A Survey
Rui Yu
Shenghua Wan
Yucen Wang
Chen-Xiao Gao
Le Gan
Zongzhang Zhang
De-Chuan Zhan
OffRL
32
0
0
18 Jun 2025
AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning
AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning
Tevin Wang
Chenyan Xiong
LRM
40
0
0
18 Jun 2025
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Zijian Zhou
Ao Qu
Zhaoxuan Wu
Sunghwan Kim
Alok Prakash
Daniela Rus
Jinhua Zhao
Bryan Kian Hsiang Low
Paul Liang
LLMAGOffRLLRM
30
0
0
18 Jun 2025
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Xianliang Yang
Ling Zhang
Haolong Qian
Lei Song
Jiang Bian
25
0
0
18 Jun 2025
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Andrew Wagenmaker
Mitsuhiko Nakamoto
Yunchu Zhang
S. Park
Waleed Yagoub
Anusha Nagabandi
Abhishek Gupta
Sergey Levine
OffRL
45
0
0
18 Jun 2025
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Yanxu Mao
Tiehan Cui
Peipei Liu
Datao You
Hongsong Zhu
AAML
23
0
0
18 Jun 2025
CC-LEARN: Cohort-based Consistency Learning
CC-LEARN: Cohort-based Consistency Learning
Xiao Ye
Shaswat Shrivastava
Zhaonan Li
Jacob Dineen
Shijie Lu
Avneet Ahuja
Ming shen
Zhikun Xu
Ben Zhou
OffRLLRM
60
0
0
18 Jun 2025
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Sheng Liu
Tianlang Chen
Pan Lu
Haotian Ye
Yizheng Chen
Lei Xing
James Zou
ReLMLRM
26
0
0
18 Jun 2025
Truncated Proximal Policy Optimization
Truncated Proximal Policy Optimization
Tiantian Fan
L. J. Liu
Yu Yue
Jiaze Chen
C. Wang
...
Zhi-Li Zhang
Xin Liu
Mingxuan Wang
Lin Yan
Yonghui Wu
OffRLLRM
20
0
0
18 Jun 2025
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement
Weixiang Zhao
Jiahe Guo
Yang Deng
Xingyu Sui
Yulin Hu
Yanyan Zhao
Wanxiang Che
Bing Qin
Tat-Seng Chua
Ting Liu
LRM
70
0
0
18 Jun 2025
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang
Yang Ding
Shuoshuo Zhang
Xinchen Zhang
Haoling Li
...
Jie Wu
Lei Ji
Yelong Shen
Y. Yang
Yeyun Gong
OffRLVLMLRM
39
0
0
17 Jun 2025
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ling Team
Bin Hu
Cai Chen
Deng Zhao
Ding Liu
...
Zhenglei Zhou
Zhenyu Huang
Zhiqiang Zhang
Zihao Wang
Zujie Wen
OffRLMoEALMLRM
59
0
0
17 Jun 2025
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Xumeng Wen
Zihan Liu
Shun Zheng
Zhijian Xu
Shengyu Ye
...
Yang Wang
Junjie Li
Ziming Miao
Jiang Bian
Mao Yang
LRM
48
0
0
17 Jun 2025
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
Xueyang Feng
Jingsen Zhang
Jiakai Tang
Wei Li
Guohao Cai
X. Chen
Quanyu Dai
Y. Zhu
Zhenhua Dong
29
0
0
17 Jun 2025
Reasoning with Exploration: An Entropy Perspective
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
43
0
0
17 Jun 2025
M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models
M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models
Can Zheng
Jiguang He
Chung G. Kang
Guofa Cai
Zitong Yu
Merouane Debbah
MoE
28
0
0
17 Jun 2025
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
Ads Dawson
Rob Mulla
Nick Landers
Shane Caldwell
ELM
38
0
0
17 Jun 2025
Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?
Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?
Louis Vervoort
Vitaly Nikolaev
33
0
0
17 Jun 2025
Explainable Detection of Implicit Influential Patterns in Conversations via Data Augmentation
Explainable Detection of Implicit Influential Patterns in Conversations via Data Augmentation
Sina Abdidizaji
Md. Kowsher
Niloofar Yousefi
Ivan I. Garibay
36
0
0
17 Jun 2025
RadFabric: Agentic AI System with Reasoning Capability for Radiology
RadFabric: Agentic AI System with Reasoning Capability for Radiology
Wenting Chen
Yi Dong
Zhaojun Ding
Yucheng Shi
Yifan Zhou
...
Tianming Liu
Ninghao Liu
Lichao Sun
Yixuan Yuan
Xiang Li
MedIm
33
0
0
17 Jun 2025
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Zewei Zhou
Tianhui Cai
Seth Z. Zhao
Yun Zhang
Zhiyu Huang
Bolei Zhou
Jiaqi Ma
LRMVLM
36
0
0
16 Jun 2025
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
Zhucun Xue
Jiangning Zhang
Xurong Xie
Yuxuan Cai
Yong-Jin Liu
Xiangtai Li
Dacheng Tao
VGenVLM
49
0
0
16 Jun 2025
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
Zihan Liu
Zhuolin Yang
Yang Chen
Chankyu Lee
Mohammad Shoeybi
Bryan Catanzaro
Wei Ping
OffRLReLMLRM
53
0
0
16 Jun 2025
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Haibo Qiu
X. Lan
Fanfan Liu
Xiaohu Sun
Delian Ruan
Peng Shi
Lin Ma
ReLMOffRLLRM
68
0
0
16 Jun 2025
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
Gyutaek Oh
Seoyeon Kim
Sangjoon Park
Byung-Hoon Kim
LM&MALRM
45
0
0
16 Jun 2025
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Runpeng Yu
Qi Li
Xinchao Wang
DiffMAI4CE
61
0
0
16 Jun 2025
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
Zijie Lin
Yang Zhang
Xiaoyan Zhao
Fengbin Zhu
Fuli Feng
Tat-Seng Chua
40
0
0
16 Jun 2025
1234...252627
Next