ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience
Runxiang Wang
Boxiao Wang
Kai Li
Yifan Zhang
Jian Cheng
40
0
0
04 Jun 2025
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
Qining Zhang
Lei Ying
81
0
0
03 Jun 2025
SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence
SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence
Zhitao Zeng
Zhu Zhuo
Xiaojun Jia
Erli Zhang
Junde Wu
...
Xiaochun Cao
Yutong Ban
Qi Dou
Yang Liu
Yueming Jin
VLM
76
0
0
03 Jun 2025
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Shengjia Zhang
Junjie Wu
Jiawei Chen
Changwang Zhang
Yudi Wu
Wangchunshu Zhou
Sheng Zhou
Can Wang
Jun Wang
LRM
67
0
0
03 Jun 2025
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa
Thomas Fel
Ekdeep Singh Lubana
Bahareh Tolooshams
Demba Ba
73
0
0
03 Jun 2025
Native-Resolution Image Synthesis
Native-Resolution Image Synthesis
Zidong Wang
Lei Bai
Xiangyu Yue
Wanli Ouyang
Yiyuan Zhang
81
0
0
03 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Xuanjing Huang
Xuanjing Huang
ELM
94
0
0
03 Jun 2025
Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs
Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs
Nguyen-Khang Le
Quan Minh Bui
Minh Nguyen
Hiep Nguyen
Trung Vo
Son T. Luu
Shoshin Nomura
Minh Le Nguyen
69
0
0
03 Jun 2025
BNPO: Beta Normalization Policy Optimization
BNPO: Beta Normalization Policy Optimization
Changyi Xiao
Mengdi Zhang
Yixin Cao
OffRL
68
0
0
03 Jun 2025
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning
Yin Fang
Qiao Jin
Guangzhi Xiong
Bowen Jin
Xianrui Zhong
Siru Ouyang
Aidong Zhang
Jiawei Han
Zhiyong Lu
ReLMOffRLLRM
59
0
0
03 Jun 2025
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia
Zekun Qi
Shaochen Zhang
Wenyao Zhang
Xinqiang Yu
Jiawei He
He Wang
L. Yi
LRMVLM
69
0
0
03 Jun 2025
Universal Reusability in Recommender Systems: The Case for Dataset- and Task-Independent Frameworks
Universal Reusability in Recommender Systems: The Case for Dataset- and Task-Independent Frameworks
Tri Kurniawan Wijaya
Xinyang Shao
Gonzalo Fiz Pontiveros
Edoardo DÁmico
31
0
0
03 Jun 2025
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
Shenghua He
Tian Xia
Xuan Zhou
Hui Wei
OffRL
80
0
0
03 Jun 2025
Beware! The AI Act Can Also Apply to Your AI Research Practices
Beware! The AI Act Can Also Apply to Your AI Research Practices
Alina Wernick
Kristof Meding
23
0
0
03 Jun 2025
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Yinjie Wang
Ling Yang
Ye Tian
Ke Shen
Mengdi Wang
LRM
96
1
0
03 Jun 2025
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Zijian Wu
Jinjie Ni
Xiangyan Liu
Zichen Liu
Hang Yan
Michael Shieh
OffRLReLMLRM
56
0
0
02 Jun 2025
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
Desen Meng
Rui Huang
Zhilin Dai
Xinhao Li
Yifan Xu
...
Z. Huang
Meng Zhang
L. Zhang
Yi Liu
Limin Wang
OffRLVLMLRM
64
0
0
02 Jun 2025
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Yihong Tang
Kehai Chen
Muyun Yang
Zhengyu Niu
Jing Li
Tiejun Zhao
Min Zhang
LLMAGAI4CELRM
65
0
0
02 Jun 2025
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
Chong Li
C. Zhu
Tao Zhang
Mingan Lin
Zenan Zhou
Jian Xie
LRM
63
0
0
02 Jun 2025
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu
Faisal Hamman
Sanghamitra Dutta
ALM
81
0
0
02 Jun 2025
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
Zeming Wei
Chengcan Wu
Meng Sun
66
0
0
02 Jun 2025
Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark
Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark
Shuyu Yang
Yilun Wang
Yaxiong Wang
Li Zhu
Zhedong Zheng
VGen
79
0
0
02 Jun 2025
RAISE: Reasoning Agent for Interactive SQL Exploration
RAISE: Reasoning Agent for Interactive SQL Exploration
Fernando Granado
R. Lotufo
J. Pereira
ReLMLRM
53
0
0
02 Jun 2025
AI Scientists Fail Without Strong Implementation Capability
AI Scientists Fail Without Strong Implementation Capability
Minjun Zhu
Qiujie Xie
Yixuan Weng
Jian Wu
Zhen Lin
Linyi Yang
Yue Zhang
ELM
102
0
0
02 Jun 2025
Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Taehoon Yoon
Yunhong Min
Kyeongmin Yeo
Minhyuk Sung
107
0
0
02 Jun 2025
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
Anya Sims
Thom Foster
Klara Kaleb
Tuan-Duy H. Nguyen
Joseph Lee
Jakob N. Foerster
Yee Whye Teh
Cong Lu
122
1
0
02 Jun 2025
Compiler Optimization via LLM Reasoning for Efficient Model Serving
Compiler Optimization via LLM Reasoning for Efficient Model Serving
Sujun Tang
Christopher Priebe
R. Mahapatra
Lianhui Qin
H. Esmaeilzadeh
LRM
76
0
0
02 Jun 2025
Self-Refining Language Model Anonymizers via Adversarial Distillation
Self-Refining Language Model Anonymizers via Adversarial Distillation
Kyuyoung Kim
Hyunjun Jeon
Jinwoo Shin
PILM
85
0
0
02 Jun 2025
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Tim Woydt
Moritz Willig
Antonia Wüst
Lukas Helff
Wolfgang Stammer
Constantin Rothkopf
Kristian Kersting
70
1
0
02 Jun 2025
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan
Zhihao Dou
Che Liu
Yu Zhang
Dongfei Cui
...
Yifan Jiang
Yangfan He
Mi Zhang
Shen Yan
Shen Yan
LRM
110
1
0
02 Jun 2025
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
Yiran Zhang
Mo Wang
Xiaoyang Li
Kaixuan Ren
Chencheng Zhu
Usman Naseem
LRM
79
0
0
02 Jun 2025
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning
Yijun Yang
Zhao-Yang Wang
Qiuping Liu
Shuwen Sun
Kang Wang
...
Zongwei Zhou
Alan Yuille
Lei Zhu
Yu Zhang
Jieneng Chen
35
0
0
02 Jun 2025
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
92
0
0
01 Jun 2025
Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision
Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision
Jiahui Zhou
Dan Li
Lin Li
Zhuomin Chen
Shunyu Wu
Haozheng Ye
Jian Lou
Costas J. Spanos
AI4TSLRM
45
0
0
01 Jun 2025
FedRPCA: Enhancing Federated LoRA Aggregation Using Robust PCA
FedRPCA: Enhancing Federated LoRA Aggregation Using Robust PCA
Divyansh Jhunjhunwala
Arian Raje
Madan Ravi Ganesh
Chaithanya Kumar Mummadi
Chaoqun Dong
Jiawei Zhou
Wan-Yi Lin
Gauri Joshi
Zhenzhen Li
56
0
0
01 Jun 2025
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation
Xinyi Liu
Lipeng Ma
Yixuan Li
Weidong Yang
Qingyuan Zhou
Jiayi Song
Shuhao Li
Ben Fei
LRM
57
0
0
01 Jun 2025
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book
Sau Lai Yip
Sunan He
Yuxiang Nie
Shu Pui Chan
Yilin Ye
Sum Ying Lam
Hao-tao Chen
LM&MA
54
0
0
01 Jun 2025
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
53
0
0
01 Jun 2025
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges
Lajos Muzsai
David Imolai
András Lukács
LLMAGLRM
40
0
0
01 Jun 2025
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
Yufei Zhan
Ziheng Wu
Yousong Zhu
Rongkun Xue
Ruipu Luo
...
Zhentao He
Zheming Yang
Ming Tang
Minghui Qiu
Jinqiao Wang
MLLMReLMLRM
78
0
0
01 Jun 2025
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Thinh Pham
Nguyen Nguyen
Pratibha Zunjare
Weiyuan Chen
Yu-Min Tseng
Tu Vu
RALMReLMELMALMLRM
106
0
0
01 Jun 2025
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
Zihang Liu
Tianyu Pang
Oleg Balabanov
Chaoqun Yang
Tianjin Huang
L. Yin
Yaoqing Yang
Shiwei Liu
LRM
77
1
0
01 Jun 2025
Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer
Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer
Yihe Dong
Lorenzo Noci
Mikhail Khodak
Mufan Li
71
0
0
01 Jun 2025
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
Wayne Zhang
Changjiang Jiang
Zhonghao Zhang
Chenyang Si
Fengchang Yu
Wei Peng
51
0
0
01 Jun 2025
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
Ke Niu
Z. Chen
Haiyang Yu
Yuwen Chen
Teng Fu
Mengyang Zhao
Bin Li
Xiangyang Xue
37
0
0
31 May 2025
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning
Sara Ghazanfari
Francesco Croce
Nicolas Flammarion
Prashanth Krishnamurthy
Farshad Khorrami
S. Garg
LRM
39
0
0
31 May 2025
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou
S. Wang
Xingyu Dong
Xiangqi Jin
Yifang Chen
Yue Min
Kexin Yang
Xingzhang Ren
Dayiheng Liu
Linfeng Zhang
OffRLLRM
42
0
0
31 May 2025
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
Weijie Xu
Shixian Cui
Xi Fang
Chi Xue
Stephanie Eckman
Chandan K. Reddy
ELM
59
0
0
31 May 2025
DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA
DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA
Yuelyu Ji
Hang Zhang
Shiven Verma
Hui Ji
Chun Li
Yushui Han
YanShan Wang
LRM
33
0
0
31 May 2025
Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors
Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors
Henrik Klagges
Robert Dahlke
Fabian Klemm
Benjamin Merkel
Daniel Klingmann
David A. Reiss
Dan Zecha
MoMeMoE
50
0
0
31 May 2025
Previous
123456...252627
Next