ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
KnowCoder-V2: Deep Knowledge Analysis
KnowCoder-V2: Deep Knowledge Analysis
Zixuan Li
Wenxuan Liu
Long Bai
Chunmao Zhang
Wei Li
...
Bingbing Xu
Xuhui Jiang
Xiaolong Jin
Jiafeng Guo
Xueqi Cheng
42
0
0
07 Jun 2025
DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains
DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains
Zhihui Chen
Kai He
Yucheng Huang
Yunxiao Zhu
Mengling Feng
DeLMOMedIm
35
0
0
07 Jun 2025
Boosting LLM Reasoning via Spontaneous Self-Correction
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao
Tengyu Xu
Xuewei Wang
Zhengxing Chen
Di Jin
...
Yun He
Sinong Wang
Han Fang
Sarath Chandar
Chen Zhu
ReLMLRMKELM
38
0
0
07 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Haiyun Jiang
OffRLLRM
27
0
0
07 Jun 2025
Saffron-1: Safety Inference Scaling
Saffron-1: Safety Inference Scaling
Ruizhong Qiu
Gaotang Li
Tianxin Wei
Jingrui He
Hanghang Tong
LRM
40
0
0
06 Jun 2025
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Jiatao Gu
Tianrong Chen
David Berthelot
Huangjie Zheng
Yuyang Wang
Ruixiang Zhang
Laurent Dinh
Miguel Angel Bautista
Josh Susskind
Shuangfei Zhai
56
0
0
06 Jun 2025
Information Bargaining: Bilateral Commitment in Bayesian Persuasion
Information Bargaining: Bilateral Commitment in Bayesian Persuasion
Yue Lin
Shuhui Zhu
William A Cunningham
Wenhao Li
Pascal Poupart
Hongyuan Zha
Baoxiang Wang
69
0
0
06 Jun 2025
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Hengzhi Li
Brendon Jiang
Alexander Naehu
Regan Song
Justin Zhang
...
Steven-Shine Chen
Adithya Balachandran
Wei Dai
Rebecca Chang
Paul Pu Liang
ReLMLRM
75
0
0
06 Jun 2025
Corrector Sampling in Language Models
Corrector Sampling in Language Models
Itai Gat
Neta Shaul
Uriel Singer
Y. Lipman
KELMAI4TS
54
0
0
06 Jun 2025
BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions
BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions
Saptarshi Sengupta
Shuhua Yang
Paul Kwong Yu
Fali Wang
Suhang Wang
63
0
0
06 Jun 2025
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Jiachen Zhu
Menghui Zhu
Renting Rui
Rong Shan
Congmin Zheng
...
Jianghao Lin
Weiwen Liu
Ruiming Tang
Yong Yu
Weinan Zhang
LLMAGELM
60
0
0
06 Jun 2025
Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework
Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework
Lingyuan Liu
Mengxiang Zhang
62
0
0
06 Jun 2025
ProRefine: Inference-time Prompt Refinement with Textual Feedback
Deepak Pandita
Tharindu Cyril Weerasooriya
A. Shah
Christopher Homan
Wei Wei
LLMAGReLMLRM
158
0
0
05 Jun 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Xin Jin
Zhenguo Li
James T. Kwok
Yu Zhang
LRM
113
0
0
05 Jun 2025
ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation
ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation
Chenhui Deng
Yun-Da Tsai
Guan-Ting Liu
Zhongzhi Yu
Haoxing Ren
LLMAGLRM
59
1
0
05 Jun 2025
TreeRPO: Tree Relative Policy Optimization
Zhicheng YANG
Zhijiang Guo
Yinya Huang
Xiaodan Liang
Yiwei Wang
Jing Tang
LRM
101
0
0
05 Jun 2025
A Reasoning-Based Approach to Cryptic Crossword Clue Solving
Martin Andrews
Sam Witteveen
ReLMELMLRM
104
0
0
05 Jun 2025
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation
Keyu Zhao
Fengli Xu
Yong Li
LRM
109
0
0
05 Jun 2025
A Smooth Sea Never Made a Skilled SAILOR\texttt{SAILOR}SAILOR: Robust Imitation via Learning to Search
A. Jain
Vibhakar Mohta
Subin Kim
Atiksh Bhardwaj
Juntao Ren
Yunhai Feng
Sanjiban Choudhury
Gokul Swamy
OffRL
130
0
0
05 Jun 2025
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
Yujun Zhou
Jiayi Ye
Zipeng Ling
Yufei Han
Yue Huang
...
Zhenwen Liang
Kehan Guo
Taicheng Guo
Xiangqi Wang
Xiangliang Zhang
ReLMLRM
140
1
0
05 Jun 2025
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang
Zhuofan Zhang
Ziyu Zhu
Yue Fan
Jing Xiong
Pengxiang Li
Xiaojian Ma
Qing Li
111
0
0
05 Jun 2025
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Lin Sun
Weihong Lin
Jinzhu Wu
Yongfu Zhu
Xiaoqi Jian
...
Change Jia
Linglin Zhang
Sai-er Hu
Yuhan Wu
Xiangzheng Zhang
ELMLRM
144
0
0
05 Jun 2025
LLMs for sensory-motor control: Combining in-context and iterative learning
J. Carvalho
S. Nolfi
LM&Ro
115
0
0
05 Jun 2025
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Quan Shi
Carlos E. Jimenez
Shunyu Yao
Nick Haber
Diyi Yang
Karthik Narasimhan
49
0
0
05 Jun 2025
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
Zelu Qi
Ping Shi
C. Zhang
Shuqi Wang
F. Zhao
Da Pan
Zefeng Ying
EGVMVGen
165
0
0
05 Jun 2025
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Yifan Sun
Jingyan Shen
Yibin Wang
Tianyu Chen
Zhendong Wang
Mingyuan Zhou
Huan Zhang
107
0
0
05 Jun 2025
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Zhihao Tang
Chaozhuo Li
Litian Zhang
Xi Zhang
DiffMMedIm
60
9
0
05 Jun 2025
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Yuyang Wanyan
Xi Zhang
Haiyang Xu
Haowei Liu
Junyang Wang
...
Ming Yan
Fei Huang
Xiaoshan Yang
W. Dong
Changsheng Xu
LLMAGLRM
189
0
0
05 Jun 2025
On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models
Xingwu Chen
Tianle Li
Difan Zou
LRM
115
0
0
05 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
127
0
0
05 Jun 2025
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
Junjie Xing
Yeye He
Mengyu Zhou
Haoyu Dong
Shi Han
Lingjiao Chen
Dongmei Zhang
S. Chaudhuri
H. V. Jagadish
LMTDELMLRM
48
0
0
05 Jun 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
Lidong Lu
Guo Chen
Z. Li
Yicheng Liu
Tong Lu
VLMLRM
109
0
0
05 Jun 2025
Customizing Speech Recognition Model with Large Language Model Feedback
Customizing Speech Recognition Model with Large Language Model Feedback
Shaoshi Ling
Guoli Ye
30
0
0
05 Jun 2025
ADAMIX: Adaptive Mixed-Precision Delta-Compression with Quantization Error Optimization for Large Language Models
ADAMIX: Adaptive Mixed-Precision Delta-Compression with Quantization Error Optimization for Large Language Models
Boya Xiong
Shuo Wang
Weifeng Ge
Guanhua Chen
Yun-Nung Chen
MQ
38
0
0
05 Jun 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Baihe Huang
Shanda Li
Tianhao Wu
Yiming Yang
Ameet Talwalkar
Kannan Ramchandran
Michael I. Jordan
Jiantao Jiao
LRM
126
0
0
05 Jun 2025
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
Kejian Zhu
Zhuoran Jin
Hongbang Yuan
Jiachun Li
Shangqing Tu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
VLMLRM
95
0
0
04 Jun 2025
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models
Ruiqi Zhang
Changyi Xiao
Yixin Cao
LRM
101
0
0
04 Jun 2025
Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance
Xixi Wang
Miguel Costa
Jordanka Kovaceva
Shuai Wang
Francisco C. Pereira
LMTD
59
0
0
04 Jun 2025
EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
Jinghan Jia
Hadi Reisizadeh
Chongyu Fan
Nathalie Baracaldo
Mingyi Hong
Sijia Liu
LRM
142
0
0
04 Jun 2025
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems
Yuxin Zhang
Yan Wang
Yongrui Chen
Shenyu Zhang
Xinbang Dai
Sheng Bi
Guilin Qi
127
0
0
04 Jun 2025
SAGE:Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs
SAGE:Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs
Aditi
Hyunwoo Park
Sicheol Sung
Yo-Sub Han
Sang-Ki Ko
19
0
0
04 Jun 2025
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design
Yifeng Xiao
Yurong Xu
Ning Yan
Masood S. Mortazavi
Pierluigi Nuzzo
118
0
0
04 Jun 2025
Enhancing Decision-Making of Large Language Models via Actor-Critic
Enhancing Decision-Making of Large Language Models via Actor-Critic
Heng Dong
Kefei Duan
Chongjie Zhang
LLMAG
33
0
0
04 Jun 2025
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal
Souradip Chakraborty
Avinash Reddy
Yifu Lu
Mengdi Wang
Dinesh Manocha
Furong Huang
Mohammad Ghavamzadeh
Amrit Singh Bedi
ReLMLRM
108
0
0
04 Jun 2025
How Far Are We from Predicting Missing Modalities with Foundation Models?
Guanzhou Ke
Yi Xie
Xiaoli Wang
Guoqing Chao
Bo Wang
Shengfeng He
VLM
115
0
0
04 Jun 2025
Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond
Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond
X-D Cai
Sihan Hu
Tao Wang
Yuan Huang
Pan Zhang
Youjin Deng
Kun Chen
LRM
88
0
0
04 Jun 2025
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
Feng Han
Yang Jiao
Shaoxiang Chen
Junhao Xu
Jingjing Chen
Yu-Gang Jiang
DiffMLRM
78
0
0
04 Jun 2025
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
Zhepei Wei
Wei-Lin Chen
Xinyu Zhu
Yu Meng
OffRL
121
0
0
04 Jun 2025
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience
Runxiang Wang
Boxiao Wang
Kai Li
Yifan Zhang
Jian Cheng
40
0
0
04 Jun 2025
Rectified Sparse Attention
Rectified Sparse Attention
Yutao Sun
Tianzhu Ye
Li Dong
Yuqing Xia
Jian Chen
Yizhao Gao
S. Cao
Jianyong Wang
Furu Wei
113
1
0
04 Jun 2025
Previous
12345...252627
Next