ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
Reinforced Latent Reasoning for LLM-based Recommendation
Reinforced Latent Reasoning for LLM-based Recommendation
Yang Zhang
Wenxin Xu
Xiaoyan Zhao
Wenjie Wang
Fuli Feng
Xiangnan He
Tat-Seng Chua
OffRLLRM
64
2
0
25 May 2025
Do Large Language Models (Really) Need Statistical Foundations?
Do Large Language Models (Really) Need Statistical Foundations?
Weijie Su
287
0
0
25 May 2025
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang
Lingling Zhang
Jie Ma
Han Lai
Fangzhi Xu
Yifei Li
Wenjun Wu
Yaqiang Wu
Jun Liu
LRM
47
0
0
25 May 2025
Benchmarking and Rethinking Knowledge Editing for Large Language Models
Benchmarking and Rethinking Knowledge Editing for Large Language Models
Guoxiu He
Xin Song
Futing Wang
Aixin Sun
KELM
54
0
0
24 May 2025
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
Hao Gu
Lujun Li
Zheyu Wang
B. Liu
Qiyuan Zhu
Sirui Han
Yike Guo
MQ
34
0
0
24 May 2025
Steering LLM Reasoning Through Bias-Only Adaptation
Steering LLM Reasoning Through Bias-Only Adaptation
Viacheslav Sinii
Alexey Gorbatovski
Artem Cherepanov
Boris Shaposhnikov
Nikita Balagansky
Daniil Gavrilov
LLMSVLRM
49
0
0
24 May 2025
VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
Tina Khezresmaeilzadeh
Parsa Razmara
Seyedarmin Azizi
Mohammad Erfan Sadeghi
Erfan Baghaei Portaghloo
AI4TS
295
0
0
24 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
160
0
0
24 May 2025
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Ruichen Zhang
Rana Muhammad Shahroz Khan
Zhen Tan
Dawei Li
Song Wang
Tianlong Chen
LRM
68
0
0
24 May 2025
Mitigating Deceptive Alignment via Self-Monitoring
Mitigating Deceptive Alignment via Self-Monitoring
Jiaming Ji
Wenqi Chen
Kaile Wang
Donghai Hong
Sitong Fang
...
Jiayi Zhou
Juntao Dai
Sirui Han
Yike Guo
Yaodong Yang
LRM
62
2
0
24 May 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
62
0
0
24 May 2025
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Zhenglin Huang
Tianxiao Li
Xiangtai Li
Haiquan Wen
Yiwei He
...
Hao Fei
Xi Yang
Xiaowei Huang
Bei Peng
Guangliang Cheng
89
0
0
24 May 2025
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Haoyuan Sun
Jiaqi Wu
Bo Xia
Yifu Luo
Yifei Zhao
Kai Qin
Xufei Lv
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
OffRLLRM
219
0
0
24 May 2025
Hybrid Latent Reasoning via Reinforcement Learning
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue
Bowen Jin
Huimin Zeng
Honglei Zhuang
Zhen Qin
Chang Jo Kim
Lanyu Shang
Jiawei Han
Dong Wang
OffRLBDLLRM
80
0
0
24 May 2025
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu
Wenkai Guo
Chubin Zhang
Yuheng Zhou
Haonan Jiang
Zifeng Gao
Yansong Tang
Ziwei Wang
OffRL
135
0
0
24 May 2025
LLM-QFL: Distilling Large Language Model for Quantum Federated Learning
LLM-QFL: Distilling Large Language Model for Quantum Federated Learning
Dev Gurung
Shiva Raj Pokhrel
FedML
219
0
0
24 May 2025
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
Wenlong Deng
Yi Ren
Muchen Li
Danica J. Sutherland
Xiaoxiao Li
Christos Thrampoulidis
74
0
0
24 May 2025
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou
Jiaming Ji
Boyuan Chen
Jiapeng Sun
Wenqi Chen
Donghai Hong
Sirui Han
Yike Guo
Yaodong Yang
36
1
0
24 May 2025
Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services
Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services
Guoheng Sun
Ziyao Wang
Xuandong Zhao
Bowei Tian
Zheyu Shen
Yexiao He
Jinming Xing
Ang Li
108
0
0
24 May 2025
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo
Ang Li
Yifei Wang
Stefanie Jegelka
Yisen Wang
OffRLReLMLRM
109
0
0
24 May 2025
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?
Hongzheng Yang
Yongqiang Chen
Zeyu Qin
Tongliang Liu
Chaowei Xiao
Kun Zhang
Bo Han
LLMSV
44
0
0
24 May 2025
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Sicheng Feng
Song Wang
Shuyi Ouyang
Lingdong Kong
Zikai Song
Jianke Zhu
Huan Wang
Xinchao Wang
LRM
120
0
0
24 May 2025
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
Shijue Huang
Hongru Wang
Wanjun Zhong
Zhaochen Su
Jiazhan Feng
Bowen Cao
Yi R. Fung
OffRLLRM
169
2
0
24 May 2025
Large Language Models in the Task of Automatic Validation of Text Classifier Predictions
Large Language Models in the Task of Automatic Validation of Text Classifier Predictions
Aleksandr Tsymbalov
61
0
0
24 May 2025
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Jiawei Du
Jinlong Wu
Yuzheng Chen
Yucheng Hu
Bing Li
Joey Tianyi Zhou
255
0
0
23 May 2025
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO
Xueji Fang
Liyuan Ma
Zhiyang Chen
Mingyuan Zhou
Guo-Jun Qi
VGen
258
0
0
23 May 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Ziwei Zhou
Rui Wang
Zuxuan Wu
AuLLMVGen
82
0
0
23 May 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
Haoran He
Jiajun Liang
X. Wang
Pengfei Wan
Di Zhang
Kun Gai
Ling Pan
DiffM
253
0
0
23 May 2025
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Yutong Chen
Jiandong Gao
Ji Wu
ALM
232
0
0
23 May 2025
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang
Jin Peng Zhou
Jonathan D. Chang
Zhaolin Gao
Nathan Kallus
Kianté Brantley
Wen Sun
LRM
98
1
0
23 May 2025
Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems
Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems
Jiayi Geng
Howard Chen
Dilip Arumugam
Thomas L. Griffiths
115
0
0
23 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
198
0
0
23 May 2025
First Finish Search: Efficient Test-Time Scaling in Large Language Models
Aradhye Agarwal
Ayan Sengupta
Tanmoy Chakraborty
ReLMRALMALMLRM
118
0
0
23 May 2025
Advertising in AI systems: Society must be vigilant
Advertising in AI systems: Society must be vigilant
Menghua Wu
Yujia Bao
107
0
0
23 May 2025
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang
Zhaoyang Jia
Zongyu Guo
Jiahao Li
Bin Li
Houqiang Li
Yan Lu
217
0
0
23 May 2025
A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models
A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models
Yanting Miao
William Loh
Suraj Kothawade
Pacal Poupart
57
0
0
23 May 2025
PMOA-TTS: Introducing the PubMed Open Access Textual Times Series Corpus
PMOA-TTS: Introducing the PubMed Open Access Textual Times Series Corpus
Shahriar Noroozizadeh
Sayantan Kumar
George H. Chen
Jeremy C. Weiss
65
0
0
23 May 2025
PD$^3$: A Project Duplication Detection Framework via Adapted Multi-Agent Debate
PD3^33: A Project Duplication Detection Framework via Adapted Multi-Agent Debate
Dezheng Bao
Yueci Yang
Xin Chen
Zhengxuan Jiang
Zeguo Fei
...
Xuanwen Huang
Junru Chen
Chutian Yu
Xiang Yuan
Yang Yang
212
0
0
23 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRLLRM
276
3
0
23 May 2025
Outcome-based Reinforcement Learning to Predict the Future
Outcome-based Reinforcement Learning to Predict the Future
Benjamin Turtel
Danny Franklin
Kris Skotheim
Luke Hewitt
Philipp Schoenegger
OffRLAI4TS
91
0
0
23 May 2025
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
Jinbang Huang
Yixin Xiao
Zhanguang Zhang
Mark Coates
Jianye Hao
Yingxue Zhang
LM&RoLRM
93
0
0
23 May 2025
One RL to See Them All: Visual Triple Unified Reinforcement Learning
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRLLRM
139
0
0
23 May 2025
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Fanqi Wan
Weizhou Shen
Shengyi Liao
Yingcheng Shi
Chenliang Li
Ziyi Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
OffRLLLMAGReLMLRM
112
0
0
23 May 2025
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu
Xiaobo Xia
Weixiang Zhao
Manyi Zhang
Xianzhi Yu
Xiu Su
Shuo Yang
See-Kiong Ng
Tat-Seng Chua
KELMLRM
108
0
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
57
0
0
23 May 2025
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
Xixian Yong
Xiao Zhou
Yingying Zhang
Jinlin Li
Yefeng Zheng
X. Wu
LRM
83
0
0
23 May 2025
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Zehua Pei
Ying Zhang
Hui-Ling Zhen
Xianzhi Yu
Wulong Liu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
MoE
68
0
0
23 May 2025
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang
Jichu Li
Yang Liu
Jinjie Wei
F. I. S. Kevin Zhou
Quyu Kong
MLLM
69
0
0
23 May 2025
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen
Xinyin Ma
Gongfan Fang
Ruonan Yu
Xinchao Wang
LRM
175
1
0
23 May 2025
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
Xueyang Li
Jiahao Li
Yu Song
Yunzhong Lou
Xiangdong Zhou
63
0
0
23 May 2025
Previous
123...8910...252627
Next