ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
VLM
123
19
0
23 Mar 2025
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation
Haoliang Shang
Hanyu Wu
Guangyao Zhai
Boyang Sun
Fangjinhua Wang
F. Tombari
Marc Pollefeys
116
0
0
23 Mar 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Yansen Wang
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
248
5
0
22 Mar 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Oucheng Huang
Yuhang Ma
Zeng Zhao
Mingrui Wu
Jiayi Ji
Rongsheng Zhang
Zhibo Hu
Xiaoshuai Sun
Rongrong Ji
85
1
0
22 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRLLRMAI4CE
130
1
0
22 Mar 2025
Offline Model-Based Optimization: Comprehensive Review
Offline Model-Based Optimization: Comprehensive Review
Minsu Kim
Jiayao Gu
Ye Yuan
Taeyoung Yun
Ziqiang Liu
Yoshua Bengio
Can Chen
OffRL
123
4
0
21 Mar 2025
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Ziyue Wang
Junde Wu
Linghan Cai
Chang Han Low
Xihong Yang
Qiaxuan Li
Yueming Jin
LRM
176
2
0
21 Mar 2025
Follow-up Question Generation For Enhanced Patient-Provider Conversations
Follow-up Question Generation For Enhanced Patient-Provider Conversations
Joseph Gatto
Parker Seegmiller
Timothy E. Burdick
Inas S. Khayal
Sarah DeLozier
S. Preum
LM&MAMedIm
121
0
0
21 Mar 2025
Position: Interactive Generative Video as Next-Generation Game Engine
Position: Interactive Generative Video as Next-Generation Game Engine
Jiwen Yu
Yiran Qin
Haoxuan Che
Quande Liu
Xintao Wang
Pengfei Wan
Di Zhang
Xihui Liu
VGen
126
4
0
21 Mar 2025
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Yuezun Li
Jiahao Xu
Tian Liang
Xingyu Chen
Zhiwei He
...
Rui Wang
Zizhuo Zhang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
95
3
0
21 Mar 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
Li Zhang
Longxi Gao
Mengwei Xu
LRM
84
2
0
21 Mar 2025
LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language
LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language
Kun-Mo Chu
Xufeng Zhao
C. Weber
Stefan Wermter
LLMAGLM&Ro
117
3
0
21 Mar 2025
TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
Sheng Wang
Pengan Chen
Jingqi Zhou
Qintong Li
Jingwei Dong
Jiahui Gao
Boyang Xue
Jiyue Jiang
Dianbo Sui
Chuan Wu
SyDa
122
0
0
21 Mar 2025
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Zhaowei Liu
X. Guo
Fangqi Lou
Lingfeng Zeng
Jinyi Niu
...
Sheng Xu
Dezhi Chen
Yun Chen
Zuo Bai
Liwen Zhang
ReLMAIFinOffRLAI4TSLRM
125
15
0
20 Mar 2025
Grammar and Gameplay-aligned RL for Game Description Generation with LLMs
Grammar and Gameplay-aligned RL for Game Description Generation with LLMs
Tsunehiko Tanaka
Edgar Simo-Serra
117
1
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRLReLMLRM
218
101
0
20 Mar 2025
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang
Chris Ngo
OffRLLRM
196
20
0
20 Mar 2025
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Ziyao Guo
Jianchao Tan
Michael Qizhe Shieh
72
0
0
20 Mar 2025
Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
Benjamin Estermann
Roger Wattenhofer
LRM
73
2
0
19 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
117
1
0
19 Mar 2025
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
Francesco Maria Molfese
Luca Moroni
Luca Gioffrè
Alessandro Sciré
Simone Conia
Roberto Navigli
ELM
130
2
0
19 Mar 2025
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
Meng Song
OffRL
78
0
0
19 Mar 2025
Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models
Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models
Hang Li
Xiao Wang
Bevan Koopman
Guido Zuccon
85
0
0
19 Mar 2025
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu
Lin Zhang
Pengtao Chen
Peng Ye
Xianfang Zeng
Wei Cheng
Gang Yu
Tao Chen
167
3
0
19 Mar 2025
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Zonghao Ying
Guangyi Zheng
Yongxin Huang
Deyue Zhang
Wenxin Zhang
Quanchen Zou
Aishan Liu
Xianglong Liu
Dacheng Tao
ELM
160
13
0
19 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
102
3
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
187
6
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jing Zhang
Xiaohui Zeng
Zhe Zhang
AI4CELM&RoLRM
209
12
0
18 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
274
217
0
18 Mar 2025
Don't lie to your friends: Learning what you know from collaborative self-play
Don't lie to your friends: Learning what you know from collaborative self-play
Jacob Eisenstein
Reza Aghajani
Adam Fisch
Dheeru Dua
Fantine Huot
Mirella Lapata
Vicky Zayats
Jonathan Berant
158
0
0
18 Mar 2025
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Limin Han
Jiaojiao Zhao
...
Beibei Huang
Rongjia Du
Ning Wang
Kai Wang
Shiguo Lian
ELM
119
1
0
18 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
Hao Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
92
2
0
18 Mar 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim
Sungwoong Kim
Jihwan Yu
Sungjae Lee
Jiwan Chung
Youngjae Yu
148
1
0
18 Mar 2025
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai
Shitian Zhao
Ming Li
Jike Zhong
Xiaofeng Yang
OffRLLRMLM&MAVLM
203
31
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yize Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
445
4
0
18 Mar 2025
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Botian Shi
Ding Wang
94
3
0
17 Mar 2025
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
Dingning Liu
Cheng Wang
Peng Gao
Renrui Zhang
Xinzhu Ma
Yuan Meng
Zhihui Wang
LRM
92
0
0
17 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLMLRM
124
8
0
17 Mar 2025
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
Zejia Zhang
Bo-Rong Yang
Xinxing Chen
Weizhuang Shi
Haoyuan Wang
Wei Luo
Jian Huang
70
0
0
17 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Yiyi Zhou
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
200
3
0
17 Mar 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Hai-Long Sun
Zhun Sun
Houwen Peng
Han-Jia Ye
LRM
146
6
0
17 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
150
73
0
17 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Yang Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&RoLRM
396
6
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
154
4
0
17 Mar 2025
LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation
LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation
Xuzhao Li
Shaika Chowdhury
Chung Il Wi
Maria Vassilaki
Ken Liu
...
Owen Garrick
Young J Juhn
James R Cerhan
Cui Tao
Nansu Zong
LM&MA
83
0
0
17 Mar 2025
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
Yihong Luo
Tianyang Hu
Weijian Luo
Kenji Kawaguchi
Jing Tang
EGVM
478
0
0
17 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
161
6
0
17 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
95
1
0
17 Mar 2025
RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning
RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning
Jerry Huang
Siddarth Madala
Risham Sidhu
Cheng Niu
Hao Peng
Julia Hockenmaier
Tong Zhang
LRMRALM
216
5
0
17 Mar 2025
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective
Luca Collini
Andrew Hennessee
Ramesh Karri
Siddharth Garg
ELMLRM
95
2
0
17 Mar 2025
Previous
123...212223...252627
Next