ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Z. Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Hairu Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Zheyu Shen
Jingyang Yuan
Junjie Qiu
Jianxin Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLM
    VLM
    OffRL
    AI4TS
    LRM
ArXivPDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 830 papers shown
Title
Grounded Chain-of-Thought for Multimodal Large Language Models
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Yiyi Zhou
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
95
1
0
17 Mar 2025
LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation
LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation
Xuran Li
Shaika Chowdhury
Chung Il Wi
Maria Vassilaki
Ken Liu
...
Owen Garrick
Young J Juhn
James R Cerhan
Cui Tao
Nansu Zong
LM&MA
64
0
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
98
3
0
17 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
H. Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
65
25
0
17 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Yong Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
171
0
0
17 Mar 2025
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
Zejia Zhang
Bo-Rong Yang
Xinxing Chen
Weizhuang Shi
Haoyuan Wang
Wei Luo
Jian Huang
48
0
0
17 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
60
4
0
17 Mar 2025
RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning
RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning
Jerry Huang
Siddarth Madala
Risham Sidhu
Cheng Niu
Julia Hockenmaier
Tong Zhang
RALM
LRM
106
2
0
17 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
59
1
0
17 Mar 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Hai-Long Sun
Zhun Sun
Houwen Peng
Han-Jia Ye
LRM
50
0
0
17 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
2
0
17 Mar 2025
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
Dingning Liu
Cheng Wang
Peng Gao
Renrui Zhang
Xinzhu Ma
Yuan Meng
Zhihui Wang
LRM
49
0
0
17 Mar 2025
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective
Luca Collini
Andrew Hennessee
Ramesh Karri
Siddharth Garg
ELM
LRM
46
1
0
17 Mar 2025
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Yihong Luo
Tianyang Hu
Weijian Luo
Kenji Kawaguchi
Jing Tang
EGVM
246
0
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yuhui Wang
Shengqiong Wu
Yuhan Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
95
11
0
16 Mar 2025
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu
Yunze Liu
Chonghan Liu
Miao Liu
VGen
LRM
64
2
0
16 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
2
0
16 Mar 2025
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Arsh Koneru
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VLM
72
1
0
15 Mar 2025
Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes
Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes
Da Wu
Zhanliang Wang
Quan Nguyen
Kai Wang
248
1
0
15 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
222
0
0
15 Mar 2025
SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning
SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning
Edward Y. Chang
Longling Geng
LLMAG
LRM
50
0
0
15 Mar 2025
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang
Yong Liu
Weixing Chen
Jingzhou Luo
Ziliang Chen
Ling Pan
G. Li
Liang Lin
70
3
0
14 Mar 2025
Residual Policy Gradient: A Reward View of KL-regularized Objective
Pengcheng Wang
Xinghao Zhu
Yuxin Chen
Chenfeng Xu
Masayoshi Tomizuka
Chenran Li
45
0
0
14 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
77
13
0
14 Mar 2025
Implicit Bias-Like Patterns in Reasoning Models
Implicit Bias-Like Patterns in Reasoning Models
Messi H.J. Lee
Calvin K. Lai
LRM
61
0
0
14 Mar 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Giacomo Camposampiero
Michael Hersche
Roger Wattenhofer
Abu Sebastian
Abbas Rahimi
LRM
58
2
0
14 Mar 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
83
2
0
14 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
65
0
0
14 Mar 2025
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Gang Li
Jizhong Liu
Heinrich Dinkel
Yadong Niu
Junbo Zhang
Jian Luan
OffRL
LRM
ReLM
71
8
0
14 Mar 2025
A Review of DeepSeek Models' Key Innovative Techniques
Chengen Wang
Murat Kantarcioglu
VLM
OffRL
56
0
0
14 Mar 2025
Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms
Camilo Chacón Sartori
Christian Blum
56
0
0
14 Mar 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Z Li
ELM
75
3
0
13 Mar 2025
New Trends for Modern Machine Translation with Large Reasoning Models
Sinuo Liu
Chenyang Lyu
Mingyang Wu
Longyue Wang
Weihua Luo
Kaifu Zhang
Zifu Shang
LRM
71
3
0
13 Mar 2025
Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach
Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach
Afrar Jahin
Arif Hassan Zidan
Wei Zhang
Yu Bao
Tianming Liu
LRM
81
1
0
13 Mar 2025
LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions
Gaurav Kumar Gupta
Pranal Pande
65
0
0
13 Mar 2025
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
Xudong Tan
Peng Ye
Chongjun Tu
Jianjian Cao
Yaoxin Yang
Lin Zhang
Dongzhan Zhou
Tao Chen
VLM
64
0
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
73
12
0
13 Mar 2025
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
ViT
OffRL
65
8
0
13 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
58
29
1
13 Mar 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
46
0
0
13 Mar 2025
Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback
Derun Li
Jianwei Ren
Y. Wang
Xin Wen
Pengxiang Li
...
Zhongpu Xia
Peng Jia
Xianpeng Lang
Ningyi Xu
Hang Zhao
61
1
0
13 Mar 2025
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Siyin Wang
Zhaoye Fei
Qinyuan Cheng
Shanghang Zhang
Panpan Cai
Jinlan Fu
Xipeng Qiu
61
1
0
13 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Zhen Zhang
Biwei Huang
Anton van den Hengel
Javen Qinfeng Shi
Javen Qinfeng Shi
241
0
0
12 Mar 2025
Reinforcement Learning is all You Need
Yongsheng Lian
ReLM
OffRL
LRM
70
0
0
12 Mar 2025
Global Position Aware Group Choreography using Large Language Model
Haozhou Pang
Tianwei Ding
Lanshan He
Qi Gan
SLR
72
0
0
12 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan Ö. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
89
38
0
12 Mar 2025
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
49
0
0
12 Mar 2025
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
Minjun Zhu
Yixuan Weng
Linyi Yang
Yue Zhang
ALM
LRM
73
4
0
11 Mar 2025
LightPlanner: Unleashing the Reasoning Capabilities of Lightweight Large Language Models in Task Planning
Weijie Zhou
Yi Peng
Manli Tao
Chaoyang Zhao
Honghui Dong
Ming Tang
Jinqiao Wang
LLMAG
LRM
52
0
0
11 Mar 2025
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Pol G. Recasens
Ferran Agullo
Yue Zhu
Chen Wang
Eun Kyung Lee
Olivier Tardieu
Jordi Torres
Josep Ll. Berral
48
0
0
11 Mar 2025
Previous
123...121314151617
Next