ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Z. Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Hairu Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLM
    VLM
    OffRL
    AI4TS
    LRM
ArXivPDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 812 papers shown
Title
Automated Movie Generation via Multi-Agent CoT Planning
Weijia Wu
Zeyu Zhu
Mike Zheng Shou
VGen
82
2
0
10 Mar 2025
AuthorMist: Evading AI Text Detectors with Reinforcement Learning
Isaac David
Arthur Gervais
DeLMO
55
0
0
10 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
94
33
0
10 Mar 2025
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Fan Yin
Zifeng Wang
I-Hung Hsu
Jun Yan
Ke Jiang
...
L. Le
Kai-Wei Chang
Chen-Yu Lee
Hamid Palangi
Tomas Pfister
60
4
0
10 Mar 2025
Dynamic Path Navigation for Motion Agents with LLM Reasoning
Yubo Zhao
Qi Wu
Yifan Wang
Yu-Wing Tai
Chi-Keung Tang
LRM
LLMAG
253
0
0
10 Mar 2025
X-GAN: A Generative AI-Powered Unsupervised Model for High-Precision Segmentation of Retinal Main Vessels toward Early Detection of Glaucoma
Cheng Huang
Weizheng Xie
Tsengdar Lee
Jui-Kai Wang
Karanjit S Kooner
Jia Zhang
219
1
0
09 Mar 2025
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Yuqi Liu
Bohao Peng
Zhisheng Zhong
Zihao Yue
Fanbin Lu
Bei Yu
Jiaya Jia
LRM
VLM
55
13
0
09 Mar 2025
Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification
Youssef Mroueh
OffRL
46
5
0
09 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRM
ReLM
53
4
0
09 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
61
47
0
09 Mar 2025
Effectiveness of Zero-shot-CoT in Japanese Prompts
Shusuke Takayama
Ian Frank
LRM
49
0
0
09 Mar 2025
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Yuxiang Zhang
Yuqi Yang
Jiangming Shu
Xinyan Wen
Jitao Sang
LRM
LLMAG
LM&Ro
51
1
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
82
0
0
08 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
Jiawei Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
64
7
0
08 Mar 2025
Accelerating Earth Science Discovery via Multi-Agent LLM Systems
Dmitrii Pantiukhin
Boris Shapkin
Ivan Kuznetsov
Antonia Anna Jost
Nikolay Koldunov
AI4CE
LLMAG
91
1
0
07 Mar 2025
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
Jiaxing Zhao
Xihan Wei
Liefeng Bo
OffRL
48
15
0
07 Mar 2025
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence
Noah Mamie
Susie Xi Rao
LLMAG
AI4CE
56
0
0
07 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
89
0
0
07 Mar 2025
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
Zhong Ji
Weilong Cao
Yan Zhang
Yanwei Pang
Jungong Han
Xuelong Li
DiffM
VLM
52
0
0
06 Mar 2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma
Niyar R. Barman
Nicholas M. Asher
Akshay Chaturvedi
LRM
AIMat
75
0
0
06 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPC
AI4CE
73
0
0
06 Mar 2025
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen
Jingyang Zhang
Jieyun Huang
Shuming Shi
Wenjing Zhang
Jiangze Yan
Rongjia Du
Ning Wang
Kai Wang
LRM
80
29
0
06 Mar 2025
KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease
Yongchao Long
Chao Yang
Gongzheng Tang
Jinwei Wang
Zhun Sui
Yuxi Zhou
Shenda Hong
Luxia Zhang
RALM
61
0
0
06 Mar 2025
Generalized Interpolating Discrete Diffusion
Dimitri von Rutte
J. Fluri
Yuhui Ding
Antonio Orvieto
Bernhard Scholkopf
Thomas Hofmann
DiffM
69
0
0
06 Mar 2025
Learning Generalizable Language-Conditioned Cloth Manipulation from Long Demonstrations
Hanyi Zhao
Jinxuan Zhu
Zihao Yan
Yichen Li
Yuhong Deng
Xueqian Wang
SSL
57
0
0
06 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Zehao Wang
MoE
217
0
0
06 Mar 2025
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Tianyu Cui
Song-Jun Xu
Artem Moskalev
Shuwei Li
Tommaso Mansi
Mangal Prakash
Rui Liao
BDL
73
0
0
06 Mar 2025
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions
Yichong Zhao
Susumu Goto
65
0
0
05 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill
Ashish Sabharwal
57
6
0
05 Mar 2025
Knowledge Augmentation in Federation: Rethinking What Collaborative Learning Can Bring Back to Decentralized Data
Wentai Wu
Ligang He
Saiqin Long
Ahmed M. Abdelmoniem
Yingliang Wu
Rui Mao
62
0
0
05 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
80
3
0
05 Mar 2025
RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction
Fenglin Liu
Jinge Wu
Hongjian Zhou
Xiao Gu
Soheila Molaei
A. Thakur
Lei A. Clifton
Honghan Wu
David Clifton
LM&MA
46
0
0
05 Mar 2025
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models
Joykirat Singh
Tanmoy Chakraborty
A. Nambi
AI4Cl
LRM
ReLM
60
1
0
04 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Paul Janson
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
78
0
0
04 Mar 2025
Learning from Failures in Multi-Attempt Reinforcement Learning
Stephen Chung
Wenyu Du
Jie Fu
LRM
42
1
0
04 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Zichen Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
87
9
0
04 Mar 2025
Comparative Analysis of OpenAI GPT-4o and DeepSeek R1 for Scientific Text Categorization Using Prompt Engineering
A. Maiti
Samuel Adewumi
Temesgen Alemayehu Tikure
Zichun Wang
Niladri Sengupta
Anastasiia Sukhanova
Ananya Jana
ELM
VLM
47
1
0
03 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
93
39
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Wenjie Qu
Xiren Zhou
MoE
SyDa
78
32
0
03 Mar 2025
Adaptively profiling models with task elicitation
Adaptively profiling models with task elicitation
Davis Brown
Prithvi Balehannina
Helen Jin
Shreya Havaldar
Hamed Hassani
Eric Wong
ALM
ELM
114
0
0
03 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Yuhang Cao
Haodong Duan
Dahua Lin
Jiaqi Wang
ObjD
VLM
LRM
75
49
0
03 Mar 2025
Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks
J. N. Kreikemeyer
Miłosz Jankowski
Pia Wilsdorf
A. Uhrmacher
77
0
0
03 Mar 2025
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Yao Wang
Mingxuan Cui
Arthur Jiang
77
0
0
03 Mar 2025
What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret
Yufeng Yuan
Yu Yue
Ruofei Zhu
Tiantian Fan
Lin Yan
OffRL
67
13
0
03 Mar 2025
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu
Yongcheng Jing
Yingjie Wang
Wenbin Hu
Dacheng Tao
RALM
LRM
74
2
0
03 Mar 2025
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
Keliang Li
Tianhua Zhang
Yunxiang Li
Hongyin Luo
Abdalla Moustafa
Xixin Wu
James Glass
Helen Meng
68
0
0
03 Mar 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Prapti Trivedi
Vikas Yadav
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudan
James Zou
Nazneen Rajani
AAML
LRM
55
2
0
03 Mar 2025
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar
Enzhuo Zhang
Zhenshi Li
Feng-Xue Gu
Yanglangxing He
Pengfeng Xiao
Xueliang Zhang
52
3
0
02 Mar 2025
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
Xuying Li
Zhuo Li
Yuji Kosuga
Victor Bian
AAML
LRM
66
4
0
02 Mar 2025
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Toby Simonds
Akira Yoshiyama
LRM
48
3
0
02 Mar 2025
Previous
123...1314151617
Next