ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLM
    VLM
    OffRL
    AI4TS
    LRM
ArXivPDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 830 papers shown
Title
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
90
3
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
56
1
0
25 Mar 2025
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Zhenyu Pan
Han Liu
OffRL
LRM
64
3
0
24 Mar 2025
Evolutionary Policy Optimization
Evolutionary Policy Optimization
Jianren Wang
Yifan Su
Abhinav Gupta
Deepak Pathak
50
0
0
24 Mar 2025
DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective
DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective
Changlun Li
Yao Shi
Yuyu Luo
Nan Tang
AIFin
57
0
0
24 Mar 2025
Sun-Shine: A Large Language Model for Tibetan Culture
Sun-Shine: A Large Language Model for Tibetan Culture
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ALM
106
2
0
24 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Video-T1: Test-Time Scaling for Video Generation
F. Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
94
3
0
24 Mar 2025
Language Model Uncertainty Quantification with Attention Chain
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li
Rushi Qiang
Lama Moukheiber
Chao Zhang
LRM
46
2
0
24 Mar 2025
RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation
RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation
Xiaolong Yin
Xingyu Lu
Jiahang Shen
Jingzhe Ni
Hailong Li
Ruofeng Tong
Min Tang
Peng Du
3DV
43
0
0
24 Mar 2025
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Weixiang Zhao
Xingyu Sui
Jiahe Guo
Yulin Hu
Yang Deng
Yanyan Zhao
Bing Qin
Wanxiang Che
Tat-Seng Chua
Ting Liu
ELM
LRM
69
5
0
23 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
62
3
0
23 Mar 2025
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation
Haoliang Shang
Hanyu Wu
Guangyao Zhai
Boyang Sun
Fangjinhua Wang
F. Tombari
Marc Pollefeys
67
0
0
23 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
74
4
0
23 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
VLM
67
5
0
23 Mar 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Oucheng Huang
Yuhang Ma
Zeng Zhao
Mingrui Wu
Jiayi Ji
Rongsheng Zhang
Zhibo Hu
Xiaoshuai Sun
Rongrong Ji
51
0
0
22 Mar 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Yansen Wang
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
93
2
0
22 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
52
1
0
22 Mar 2025
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark
Zhiyu Li
Yiying Yang
Jiping Lang
Wenhao Jiang
Yuhang Zhao
...
Yuhua Bi
Xiaofei Zeng
Yixian Chen
Junrong Chen
Lin Yao
AI4MH
LM&MA
ELM
51
0
0
22 Mar 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Li Zhang
Longxi Gao
Mengwei Xu
LRM
50
1
0
21 Mar 2025
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Yuan Li
Jiahao Xu
Tian Liang
Xingyu Chen
Zhiwei He
...
Rui Wang
Zizhuo Zhang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
55
1
0
21 Mar 2025
Offline Model-Based Optimization: Comprehensive Review
Offline Model-Based Optimization: Comprehensive Review
Minsu Kim
Jiayao Gu
Ye Yuan
Taeyoung Yun
Ziqiang Liu
Yoshua Bengio
Can Chen
OffRL
67
2
0
21 Mar 2025
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Ziyue Wang
Junde Wu
Linghan Cai
Chang Han Low
Xihong Yang
Qiaxuan Li
Yueming Jin
LRM
70
2
0
21 Mar 2025
Follow-up Question Generation For Enhanced Patient-Provider Conversations
Follow-up Question Generation For Enhanced Patient-Provider Conversations
Joseph Gatto
Parker Seegmiller
Timothy E. Burdick
Inas S. Khayal
Sarah DeLozier
S. Preum
LM&MA
MedIm
63
0
0
21 Mar 2025
LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language
LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language
Kun-Mo Chu
Xufeng Zhao
C. Weber
Stefan Wermter
LLMAG
LM&Ro
59
1
0
21 Mar 2025
TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
Sheng Wang
Pengan Chen
Jingqi Zhou
Qintong Li
Jingwei Dong
Jiahui Gao
Boyang Xue
Jiyue Jiang
Lingpeng Kong
Chuan Wu
SyDa
71
0
0
21 Mar 2025
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang
Chris Ngo
OffRL
LRM
52
11
0
20 Mar 2025
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Ziyao Guo
Kaipeng Zhang
Michael Qizhe Shieh
43
0
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
86
40
0
20 Mar 2025
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Zhaowei Liu
X. Guo
Fangqi Lou
Lingfeng Zeng
Jinyi Niu
...
Sheng Xu
Dezhi Chen
Yun Chen
Zuo Bai
Liwen Zhang
ReLM
AIFin
OffRL
AI4TS
LRM
56
6
0
20 Mar 2025
Grammar and Gameplay-aligned RL for Game Description Generation with LLMs
Grammar and Gameplay-aligned RL for Game Description Generation with LLMs
Tsunehiko Tanaka
Edgar Simo-Serra
56
0
0
20 Mar 2025
Pseudo-Relevance Feedback Can Improve Zero-Shot LLM-Based Dense Retrieval
Pseudo-Relevance Feedback Can Improve Zero-Shot LLM-Based Dense Retrieval
Hang Li
Xiao Wang
Bevan Koopman
Guido Zuccon
49
0
0
19 Mar 2025
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Zonghao Ying
Guangyi Zheng
Yongxin Huang
Deyue Zhang
Wenxin Zhang
Quanchen Zou
Aishan Liu
Xianglong Liu
Dacheng Tao
ELM
87
7
0
19 Mar 2025
Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
Benjamin Estermann
Roger Wattenhofer
LRM
41
1
0
19 Mar 2025
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu
Lin Zhang
Pengtao Chen
Peng Ye
Xianfang Zeng
Wei Cheng
Gang Yu
Tao Chen
96
0
0
19 Mar 2025
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
Francesco Maria Molfese
Luca Moroni
Luca Gioffrè
Alessandro Sciré
Simone Conia
Roberto Navigli
ELM
84
0
0
19 Mar 2025
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
Meng Song
OffRL
51
0
0
19 Mar 2025
Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks
Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks
Yuting Guo
A. Sarker
AI4MH
74
0
0
19 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
50
1
0
19 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yuyao Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
Tieniu Tan
230
2
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
Hao Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
67
1
0
18 Mar 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim
Sungwoong Kim
Jihwan Yu
Sungjae Lee
Jiwan Chung
Youngjae Yu
76
1
0
18 Mar 2025
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai
Shitian Zhao
Ming Li
Jike Zhong
Xiaofeng Yang
OffRL
LRM
LM&MA
VLM
81
11
0
18 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zhe Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
78
69
0
18 Mar 2025
Don't lie to your friends: Learning what you know from collaborative self-play
Don't lie to your friends: Learning what you know from collaborative self-play
Jacob Eisenstein
Reza Aghajani
Adam Fisch
Dheeru Dua
Fantine Huot
Mirella Lapata
Vicky Zayats
Jonathan Berant
72
0
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jingyang Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
70
5
0
18 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
63
1
0
18 Mar 2025
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Limin Han
Jiaojiao Zhao
...
Beibei Huang
Rongjia Du
Ning Wang
Kai Wang
Shiguo Lian
ELM
56
1
0
18 Mar 2025
Superalignment with Dynamic Human Values
Florian Mai
David Kaczér
Nicholas Kluge Corrêa
Lucie Flek
65
0
0
17 Mar 2025
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Botian Shi
Ding Wang
57
1
0
17 Mar 2025
Previous
123...111213...151617
Next