ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation
SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation
Mingchao Jiang
Abhinav C. P. Jain
Sophia Zorek
C. Jermaine
LLMAGALMELM
29
0
0
21 May 2025
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao
Xingyu Sui
Yulin Hu
Jiahe Guo
Haixiao Liu
Biye Li
Yanyan Zhao
Bing Qin
Ting Liu
OffRL
115
1
0
21 May 2025
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
Jie Ma
Ning Qu
Zhitao Gao
Rui Xing
Jun Liu
...
Jiang Xie
Linyun Song
Pinghui Wang
Jing Tao
Zhou Su
86
0
0
21 May 2025
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Hyunseok Lee
Jeonghoon Kim
Beomjun Kim
Jihoon Tack
Chansong Jo
Jaehong Lee
Cheonbok Park
Sookyo In
Jinwoo Shin
Kang Min Yoo
148
0
0
21 May 2025
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Kangan Qian
Sicong Jiang
Yang Zhong
Ziang Luo
Zilin Huang
...
Yifei Hu
Guang Li
Guang Chen
Hao Ye
Lijun Sun
LRM
132
1
0
21 May 2025
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu
Haiyang Yu
Siliang Xu
Shiwei Ran
Guozhi Tang
...
Teng Fu
Hao Feng
Jingqun Tang
Hongru Wang
Can Huang
LRM
123
3
0
21 May 2025
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Xinran Wang
Muxi Diao
Yuanzhi Liu
Chunyu Wang
Kongming Liang
Zhanyu Ma
Jun Guo
96
0
0
21 May 2025
Learning to Reason via Mixture-of-Thought for Logical Reasoning
Learning to Reason via Mixture-of-Thought for Logical Reasoning
Tong Zheng
Lichang Chen
Simeng Han
R. Thomas McCoy
Heng Huang
LRM
114
1
0
21 May 2025
Is (Selective) Round-To-Nearest Quantization All You Need?
Is (Selective) Round-To-Nearest Quantization All You Need?
Alex Kogan
MQ
57
0
0
21 May 2025
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Jiaming Zhou
Ke Ye
Jiayi Liu
Teli Ma
Zifang Wang
Ronghe Qiu
Kun-Yu Lin
Zhilin Zhao
Junwei Liang
134
2
0
21 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
154
1
0
21 May 2025
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Eric Hanchen Jiang
Haozheng Luo
Shengyuan Pang
Xiaomin Li
Zhenting Qi
...
Zongyu Lin
Xinfeng Li
Hao Xu
Kai-Wei Chang
Ying Nian Wu
LRM
136
0
0
21 May 2025
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Jan Tönshoff
Martin Grohe
102
0
0
21 May 2025
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder
Deep Karkhanis
OffRL
83
0
0
21 May 2025
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
Weixiang Zhao
Jiahe Guo
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
Yanyan Zhao
Wanxiang Che
Bing Qin
Tat-Seng Chua
Ting Liu
LRM
144
0
0
21 May 2025
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
Bowen Jin
Chang Jo Kim
Priyanka Kargupta
Sercan O. Arik
Jiawei Han
LRM
171
2
0
21 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoELRMAI4CE
172
0
0
21 May 2025
InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation
InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation
Yunjia Xi
Jianghao Lin
Menghui Zhu
Yongzhao Xiao
Zhuoying Ou
...
Weiwen Liu
Yasheng Wang
Ruiming Tang
Weinan Zhang
Yong Yu
132
1
0
21 May 2025
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen
Bo Leng
Zhuoren Li
Hanming Deng
Guizhe Jin
Ran Yu
Huanxi Wen
239
0
0
21 May 2025
MMaDA: Multimodal Large Diffusion Language Models
MMaDA: Multimodal Large Diffusion Language Models
Ling Yang
Ye Tian
Bowen Li
Xinchen Zhang
Ke Shen
Yunhai Tong
Mengdi Wang
VLMLRM
157
6
0
21 May 2025
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Weiming Wu
Zi-kang Wang
Jin Ye
Zhi Zhou
Yu-Feng Li
Lan-Zhe Guo
LRM
71
0
0
21 May 2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
145
1
0
21 May 2025
Can Large Language Models Really Recognize Your Name?
Can Large Language Models Really Recognize Your Name?
Dzung Pham
Peter Kairouz
Niloofar Mireshghallah
Eugene Bagdasarian
Chau Minh Pham
Amir Houmansadr
PILM
70
1
0
20 May 2025
RLVR-World: Training World Models with Reinforcement Learning
RLVR-World: Training World Models with Reinforcement Learning
Jialong Wu
Shaofeng Yin
Ningya Feng
Mingsheng Long
OffRLVGen
89
2
0
20 May 2025
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
Haijun Li
Tianqi Shi
Zifu Shang
Yuxuan Han
Xueyu Zhao
...
Longyue Wang
Gongbo Tang
Weihua Luo
Zhao Xu
Kaifu Zhang
ELM
57
0
0
20 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
126
0
0
20 May 2025
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
Xiong Jun Wu
Zhenduo Zhang
ZuJie Wen
Zhiqiang Zhang
Wang Ren
...
Xudong Han
Chengfu Tang
Dingnan Jin
Qing Cui
Jun Zhou
LRM
238
1
0
20 May 2025
Think Only When You Need with Large Hybrid-Reasoning Models
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang
Xun Wu
Shaohan Huang
Qingxiu Dong
Zewen Chi
Li Dong
Xingxing Zhang
Tengchao Lv
Lei Cui
Furu Wei
OffRLLRM
163
5
0
20 May 2025
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Bufang Yang
Lilin Xu
Liekang Zeng
Kaiwei Liu
Siyang Jiang
Wenrui Lu
Hongkai Chen
Xiaofan Jiang
Guoliang Xing
Zhenyu Yan
LLMAG
116
0
0
20 May 2025
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
Jiaer Xia
Yuhang Zang
Peng Gao
Yixuan Li
Kaiyang Zhou
OffRLReLMAI4TSVLMLRM
124
0
0
20 May 2025
Self-Evolving Curriculum for LLM Reasoning
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Jian Tang
Alexandre Piché
Nicolas Angelard-Gontier
Yoshua Bengio
Ehsan Kamalloo
ReLMLRM
126
0
0
20 May 2025
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Minwu Kim
Anubhav Shrestha
Safal Shrestha
Aadim Nepal
Keith Ross
88
0
0
20 May 2025
FLASH-D: FlashAttention with Hidden Softmax Division
FLASH-D: FlashAttention with Hidden Softmax Division
K. Alexandridis
Vasileios Titopoulos
G. Dimitrakopoulos
53
0
0
20 May 2025
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery
Kun Li
Zhennan Wu
Shoupeng Wang
Wenbin Hu
LLMAGLM&MA
73
0
0
20 May 2025
General-Reasoner: Advancing LLM Reasoning Across All Domains
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma
Qian Liu
Dongfu Jiang
Ge Zhang
Zejun Ma
Wenhu Chen
AI4CELRM
142
6
0
20 May 2025
Context-Free Synthetic Data Mitigates Forgetting
Context-Free Synthetic Data Mitigates Forgetting
Parikshit Bansal
Sujay Sanghavi
CLL
139
0
0
20 May 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRMLLMSV
226
1
0
20 May 2025
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao
Yuchen Yan
Yongliang Shen
Haolei Xu
Wenqi Zhang
Kaitao Song
Jian Shao
Weiming Lu
Jun Xiao
Yueting Zhuang
LRM
127
0
0
20 May 2025
Improved Methods for Model Pruning and Knowledge Distillation
Improved Methods for Model Pruning and Knowledge Distillation
Wei Jiang
Anying Fu
Youling Zhang
VLM
27
0
0
20 May 2025
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
108
0
0
20 May 2025
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Siddhant Bhambri
Upasana Biswas
Subbarao Kambhampati
152
1
0
20 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELMLRM
174
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
262
0
0
19 May 2025
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan Yu
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
121
0
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
109
5
0
19 May 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li
Chenxi Li
Tong Wu
Xuekai Zhu
Yuxuan Wang
...
Eric Hanchen Jiang
Song-Chun Zhu
Zixia Jia
Ying Nian Wu
Zilong Zheng
LRM
130
1
0
19 May 2025
Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang
Jie Lu
En Yu
70
1
0
19 May 2025
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
Guoheng Sun
Ziyao Wang
Bowei Tian
Meng Liu
Zheyu Shen
Shwai He
Yexiao He
Wanghao Ye
Yiting Wang
Ang Li
LRM
70
0
0
19 May 2025
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
142
0
0
19 May 2025
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Zhongxiang Sun
Qipeng Wang
Haoyu Wang
Xiao Zhang
Jun Xu
HILMLRM
120
0
0
19 May 2025
Previous
123...101112...252627
Next