ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs
EvolvTrip: Enhancing Literary Character Understanding with Temporal Theory-of-Mind Graphs
Bohao Yang
Hainiu Xu
Jinhua Du
Ze Li
Yulan He
Chenghua Lin
47
0
0
16 Jun 2025
FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design
FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design
Kai Lan
Jiayong Zhu
Jiangtong Li
Dawei Cheng
Guang-Sheng Chen
Changjun Jiang
LRM
36
0
0
16 Jun 2025
ExtendAttack: Attacking Servers of LRMs via Extending Reasoning
ExtendAttack: Attacking Servers of LRMs via Extending Reasoning
Zhenhao Zhu
Yue Liu
Yingwei Ma
Hongcheng Gao
Nuo Chen
Yanpei Guo
Wenjie Qu
Huiying Xu
Xinzhong Zhu
Jiaheng Zhang
AAMLLRM
40
0
0
16 Jun 2025
Multipole Attention for Efficient Long Context Reasoning
Multipole Attention for Efficient Long Context Reasoning
Coleman Hooper
Sebastian Zhao
Luca Manolache
Sehoon Kim
Michael W. Mahoney
Y. Shao
Kurt Keutzer
Amir Gholami
OffRLLRM
35
0
0
16 Jun 2025
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Runpeng Yu
Qi Li
Xinchao Wang
DiffMAI4CE
61
0
0
16 Jun 2025
Position: Pause Recycling LoRAs and Prioritize Mechanisms to Uncover Limits and Effectiveness
Position: Pause Recycling LoRAs and Prioritize Mechanisms to Uncover Limits and Effectiveness
Mei-Yen Chen
Thi Thu Uyen Hoang
Michael Hahn
M. Sarfraz
MoMe
35
0
0
16 Jun 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Kaiyuan Chen
Y. Ren
Yang Liu
Xiaobo Hu
Haotong Tian
...
Yuan Jiang
Zexuan Liu
Zihan Yin
Zijian Ma
Zhiwen Mo
53
0
0
16 Jun 2025
Document-Level Tabular Numerical Cross-Checking: A Coarse-to-Fine Approach
Document-Level Tabular Numerical Cross-Checking: A Coarse-to-Fine Approach
Chaoxu Pang
Yixuan Cao
Ganbin Zhou
Hongwei Bran Li
Ping Luo
LMTD
52
0
0
16 Jun 2025
Cross-architecture universal feature coding via distribution alignment
Cross-architecture universal feature coding via distribution alignment
Changsheng Gao
Shan Liu
Feng Wu
Weisi Lin
OOD
9
0
0
15 Jun 2025
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
SPECS\texttt{SPECS}SPECS: Faster Test-Time Scaling through Speculative Drafts
Mert Cemri
Nived Rajaraman
Rishabh Tiwari
Xiaoxuan Liu
Kurt Keutzer
Ion Stoica
Kannan Ramchandran
Ahmad Beirami
Ziteng Sun
LRM
29
0
0
15 Jun 2025
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
Changsheng Wang
Chongyu Fan
Yihua Zhang
Jinghan Jia
Dennis Wei
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MUKELMLRM
65
0
0
15 Jun 2025
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
Qirui Zhou
Shaohui Peng
Weiqiang Xiong
Haixin Chen
Yuanbo Wen
...
Ke Gao
Ruizhi Chen
Yanjun Wu
Chen Zhao
Y. Chen
LRM
37
0
0
14 Jun 2025
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Asghar Ghorbani
Hanieh Fattahi
46
0
0
14 Jun 2025
Performance Plateaus in Inference-Time Scaling for Text-to-Image Diffusion Without External Models
Performance Plateaus in Inference-Time Scaling for Text-to-Image Diffusion Without External Models
Changhyun Choi
S. Kim
H. Jin Kim
DiffM
28
0
0
14 Jun 2025
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
Kaiyuan Liu
Chen Shen
Zhanwei Zhang
Junjie Liu
Xiaosong Yuan
Jieping Ye
ReLMLRM
57
0
0
14 Jun 2025
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics
Asifullah Khan
Muhammad Zaeem Khan
Saleha Jamshed
Sadia Ahmad
Aleesha Zainab
Kaynat Khatib
Faria Bibi
Abdul Rehman
OffRLLRM
42
0
0
14 Jun 2025
Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning
Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning
Sara Rajaram
R. J. Cotton
Fabian H. Sinz
29
0
0
14 Jun 2025
VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?
VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?
Jiachen Yu
Yufei Zhan
Ziheng Wu
Yousong Zhu
Jinqiao Wang
Minghui Qiu
VLMLRM
36
0
0
13 Jun 2025
Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs
Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs
Wei Li
Yunyao Cheng
Xinli Hao
Chaohong Ma
Yuxuan Liang
Bin Yang
Christian S.Jensen
Xiaofeng Meng
AI4TS
47
0
0
13 Jun 2025
Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs
Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs
Linlin Wang
Tianqing Zhu
Laiqiao Qin
Longxiang Gao
Wanlei Zhou
31
0
0
13 Jun 2025
TongSearch-QR: Reinforced Query Reasoning for Retrieval
TongSearch-QR: Reinforced Query Reasoning for Retrieval
Xubo Qin
Jun Bai
Jiaqi Li
Zixia Jia
Zilong Zheng
ReLMRALMLRM
61
0
0
13 Jun 2025
RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
Yu Wang
Shiwan Zhao
Ming Fan
Zhihu Wang
Y. Zhang
Xicheng Zhang
Zhengfan Wang
Heyuan Huang
Ting Liu
VLMLRM
45
0
0
13 Jun 2025
Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task
Schema-R1: A reasoning training approach for schema linking in Text-to-SQL Task
Wuzhenghong Wen
Su Pan
yuwei Sun
ReLMLRM
78
0
0
13 Jun 2025
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Xudong Zhu
Jiachen Jiang
Mohammad Mahdi Khalili
Zhihui Zhu
ReLMLM&RoLRM
65
0
0
13 Jun 2025
LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Yanan Cai
Ahmed Salem
Besmira Nushi
M. Russinovich
LLMAGLRM
136
0
0
12 Jun 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
140
0
0
12 Jun 2025
Self-Adapting Language Models
Self-Adapting Language Models
Adam Zweiger
Jyothish Pari
Han Guo
Ekin Akyürek
Yoon Kim
Pulkit Agrawal
KELMLRM
155
0
0
12 Jun 2025
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models
Qiyue Yin
Pei Xu
Qiaozhe Li
Shengda Liu
S. Shen
...
Lei Cui
Chengxin Yan
Jie Sun
Xiangquan Tang
K. Huang
LLMAGELMLRM
124
0
0
12 Jun 2025
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
Yucong Luo
Yitong Zhou
Mingyue Cheng
Jiahao Wang
Daoyu Wang
Tingyue Pan
Jintao Zhang
AI4TSLRM
129
0
0
12 Jun 2025
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Huaying Yuan
Zheng Liu
Junjie Zhou
Ji-Rong Wen
Ji-Rong Wen
Zhicheng Dou
VLM
139
0
0
12 Jun 2025
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Qingyan Wei
Y. Zhang
Zhiyuan Liu
Dongrui Liu
Linfeng Zhang
DiffMAI4CE
159
0
0
12 Jun 2025
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Jikai Jin
Vasilis Syrgkanis
Sham Kakade
Hanlin Zhang
ELM
142
1
0
12 Jun 2025
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
Luke Rowe
Rodrigue de Schaetzen
Roger Girgis
C. Pal
Liam Paull
MLLMVLM
36
0
0
12 Jun 2025
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Xiaozhe Li
Jixuan Chen
Xinyu Fang
Shengyuan Ding
Haodong Duan
Qingwen Liu
Kai-xiang Chen
LLMAGLRM
120
0
0
12 Jun 2025
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
Vincenzo Colle
Mohamed Sana
Nicola Piovesan
A. De Domenico
Fadhel Ayed
Merouane Debbah
85
0
0
12 Jun 2025
Provably Learning from Language Feedback
Provably Learning from Language Feedback
Wanqiao Xu
Allen Nie
Ruijie Zheng
Aditya Modi
Adith Swaminathan
Ching-An Cheng
166
0
0
12 Jun 2025
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
Jintao Liang
Gang Su
Huifeng Lin
You Wu
Rui Zhao
Ziyue Li
3DVLRM
143
0
0
12 Jun 2025
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Yaoming Zhu
Junxin Wang
Yiyang Li
Lin Qiu
Zongyu Wang
...
Xuezhi Cao
Yuhuai Wei
Mingshi Wang
Xunliang Cai
Rong Ma
LRM
131
0
0
12 Jun 2025
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Y. Jiang
Yuwen Xiong
Yufeng Yuan
Chao Xin
Wenyuan Xu
Yu Yue
Qianchuan Zhao
Lin Yan
LRM
135
0
0
12 Jun 2025
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
Reza Fayyazi
Michael Zuzak
S. Yang
39
0
0
12 Jun 2025
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
Zijie Wu
Chaohui Yu
Fan Wang
Xiang Bai
AI4CE
65
0
0
11 Jun 2025
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Yu Sun
Xingyu Qian
Weiwen Xu
Hao Zhang
Chenghao Xiao
Long Li
Yu Rong
Wenbing Huang
Qifeng Bai
Tingyang Xu
LRM
79
0
0
11 Jun 2025
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Xiyao Wang
Zhengyuan Yang
Chao Feng
Yongyuan Liang
Yuhang Zhou
...
Chung-Ching Lin
Kevin Lin
Linjie Li
Furong Huang
L. xilinx Wang
OffRLLRM
73
0
0
11 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
92
0
0
11 Jun 2025
CoRT: Code-integrated Reasoning within Thinking
CoRT: Code-integrated Reasoning within Thinking
Chengpeng Li
Zhengyang Tang
Ziniu Li
Mingfeng Xue
Keqin Bao
...
Ruoyu Sun
Benyou Wang
Xiang Wang
Junyang Lin
Dayiheng Liu
LLMAGOffRLReLMLRM
85
0
0
11 Jun 2025
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Zhenran Xu
Yiyu Wang
Xue Yang
Longyue Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
AI4TSLRM
85
0
0
11 Jun 2025
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang
Yuwei An
Hongyi Liu
Tianqi Chen
Beidi Chen
SyDaLRM
189
0
0
11 Jun 2025
3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
Xiaotang Gai
Jiaxiang Liu
Yichen Li
Zijie Meng
Jian Wu
Zuozhu Liu
VGen
27
0
0
11 Jun 2025
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Yuting Li
Lai Wei
Kaipeng Zheng
Jingyuan Huang
Linghe Kong
Lichao Sun
Weiran Huang
AAMLLRMVLM
89
0
0
11 Jun 2025
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training
Shurui Gui
Shuiwang Ji
LRM
83
0
0
11 Jun 2025
Previous
12345...252627
Next