ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
Bo Shen
Cici
Dawei Zhu
...
Yun Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoEReLMLRMAI4CE
186
7
0
12 May 2025
Assessing the Chemical Intelligence of Large Language Models
Assessing the Chemical Intelligence of Large Language Models
Nicholas T. Runcie
Charlotte M. Deane
Fergus Imrie
ELMLRM
140
0
0
12 May 2025
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiasi Chen
Samet Oymak
ReLMLRM
191
3
0
12 May 2025
REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction
REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction
Fei Liu
Huanhuan Ren
Yu Guan
Xiuxu Wang
Wang Lv
Zhiqiang Hu
Y. Chen
63
0
0
12 May 2025
Evaluating Large Language Models for Real-World Engineering Tasks
Evaluating Large Language Models for Real-World Engineering Tasks
Rene Heesch
Sebastian Eilermann
Alexander Windmann
Alexander Diedrich
Philipp Rosenthal
Oliver Niggemann
ELM
87
0
0
12 May 2025
Injecting Knowledge Graphs into Large Language Models
Injecting Knowledge Graphs into Large Language Models
Erica Coppolillo
107
0
0
12 May 2025
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
140
2
0
12 May 2025
Re$^2$: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions
Re2^22: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions
Daoze Zhang
Zhijian Bao
S. Du
Zhiyi Zhao
Kuangling Zhang
Dezheng Bao
Yang Yang
63
1
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLMLRM
167
3
0
12 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVMVGen
106
8
0
12 May 2025
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
Aybora Koksal
A. Aydin Alatan
LRM
68
1
0
12 May 2025
Learning from Peers in Reasoning Models
Learning from Peers in Reasoning Models
Tongxu Luo
Wenyu Du
Jiaxi Bi
Stephen Chung
Zhengyang Tang
Hao Yang
M. Zhang
Benyou Wang
LRM
83
0
0
12 May 2025
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang
Yuchen Zhuang
Yinghao Li
D. Kilman
Rongzhi Zhang
...
Ian Shu-Hei Wong
Sherry Yang
Percy Liang
Chao Zhang
Bo Dai
ELM
143
1
0
12 May 2025
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang
Chris
Jiangbo Pei
Wei Shen
Yi Peng
...
Ai Jian
Tianyidan Xie
Xuchen Song
Yang Liu
Yahui Zhou
OffRLLRM
139
2
0
12 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAMLELM
124
0
0
12 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
Choong Seon Hong
AI4CE
165
1
0
11 May 2025
Applying Cognitive Design Patterns to General LLM Agents
Applying Cognitive Design Patterns to General LLM Agents
R. Wray
James R. Kirk
John E. Laird
LLMAGAI4TSAI4CE
139
0
0
11 May 2025
Implementing Long Text Style Transfer with LLMs through Dual-Layered Sentence and Paragraph Structure Extraction and Mapping
Implementing Long Text Style Transfer with LLMs through Dual-Layered Sentence and Paragraph Structure Extraction and Mapping
Yusen Wu
Xiaotie Deng
79
0
0
11 May 2025
LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus
LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus
Peihan Li
Lifeng Zhou
76
0
0
10 May 2025
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
127
16
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
Junxuan Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
169
0
0
09 May 2025
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons
Andrew Kiruluta
Preethi Raju
Priscilla Burity
39
0
0
09 May 2025
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov
Farzan Farnia
Roozbeh Yousefzadeh
LRM
153
0
0
09 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Yu Wang
Haoyu Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
80
0
0
09 May 2025
Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought
Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought
Mingfei Zeng
Ming Xie
Xixi Zheng
Chunhai Li
Chuan Zhang
Liehuang Zhu
73
0
0
08 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
265
5
0
08 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
78
0
0
08 May 2025
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Yun Wang
Z. Fu
Jie Cai
Peijun Tang
Hongya Lyu
...
Jie Zhou
Guoyang Zeng
Chaojun Xiao
Xu Han
Zhiyuan Liu
141
1
0
08 May 2025
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts
Manik Sheokand
Parth Sawant
ELM
161
0
0
08 May 2025
Scaling Laws for Speculative Decoding
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
135
0
0
08 May 2025
GroverGPT-2: Simulating Grover's Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization
GroverGPT-2: Simulating Grover's Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization
Min Chen
Jinglei Cheng
Pingzhi Li
Haoran Wang
Tianlong Chen
Junyu Liu
LRM
146
0
0
08 May 2025
Adaptive Stress Testing Black-Box LLM Planners
Adaptive Stress Testing Black-Box LLM Planners
Neeloy Chakraborty
John Pohovey
Melkior Ornik
Katherine Driggs-Campbell
94
0
0
08 May 2025
Crosslingual Reasoning through Test-Time Scaling
Crosslingual Reasoning through Test-Time Scaling
Zheng-Xin Yong
Muhammad Farid Adilazuarda
Jonibek Mansurov
Ruochen Zhang
Niklas Muennighoff
Carsten Eickhoff
Genta Indra Winata
Julia Kreutzer
Stephen H. Bach
Alham Fikri Aji
LRMELM
465
9
0
08 May 2025
Replay to Remember (R2R): An Efficient Uncertainty-driven Unsupervised Continual Learning Framework Using Generative Replay
Replay to Remember (R2R): An Efficient Uncertainty-driven Unsupervised Continual Learning Framework Using Generative Replay
Sriram Mandalika
Harsha Vardhan
Athira Nambiar
VLM
233
0
0
07 May 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing
Xiaowei Hu
Chi-Wing Fu
Wei Wang
Jifeng Dai
Pheng-Ann Heng
MLLMOffRLVLMLRM
112
4
0
07 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
168
0
0
07 May 2025
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
Wenjun Cao
AAML
89
0
0
07 May 2025
MedSyn: Enhancing Diagnostics with Human-AI Collaboration
MedSyn: Enhancing Diagnostics with Human-AI Collaboration
Burcu Sayin
Ipek Baris Schlicht
Ngoc Vo Hong
Sara Allievi
Jacopo Staiano
Pasquale Minervini
Andrea Passerini
LM&MA
39
0
0
07 May 2025
Benchmarking LLMs' Swarm intelligence
Benchmarking LLMs' Swarm intelligence
Kai Ruan
Mowen Huang
Ji-Rong Wen
Hao Sun
264
0
0
07 May 2025
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
Ning Cheng
Jinan Xu
Jialing Chen
Wenjuan Han
LRM
89
0
0
07 May 2025
VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction Tuning
VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction Tuning
T. Vuong
J. T. Kwak
VGen
115
0
0
07 May 2025
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Yehui Tang
Yichun Yin
Yaoyuan Wang
Hang Zhou
Yu Pan
...
Zhe Liu
Zhicheng Liu
Zhuowen Tu
Zilin Ding
Zongyuan Zhan
MoE
117
2
0
07 May 2025
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters
David Exler
Mark Schutera
Markus Reischl
Luca Rettenberger
105
1
0
07 May 2025
Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis
Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis
Shuang Zhou
Jiashuo Wang
Zidu Xu
Song Wang
David Brauer
...
Zaifu Zhan
Yu Hou
Mingquan Lin
Genevieve B. Melton
Rui Zhang
78
0
0
06 May 2025
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
Qingbin Liu
Xiaohan Lin
Jonas Bayer
Yael Dillies
Weijie Jiang
...
Zhengfeng Yang
Jiawei Zhang
Lihong Zhi
Jia-Nan Li
Zhengying Liu
287
2
0
06 May 2025
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
Liam Boyle
Nicolas Baumann
Paviththiren Sivasothilingam
Michele Magno
Luca Benini
LM&RoLRM
181
0
0
06 May 2025
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang
Zhimin Li
Yuhang Zang
Chunyu Wang
Qinglin Lu
Cheng Jin
Jinqiao Wang
LRM
157
11
0
06 May 2025
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Token Communication-Driven Multimodal Large Models in Resource-Constrained Multiuser Networks
Junhe Zhang
Wanli Ni
Pengwei Wang
Dongyu Wang
84
0
0
06 May 2025
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Junyu Ma
Tianqing Fang
Zizhuo Zhang
Hongming Zhang
Haitao Mi
Dong Yu
ReLMRALMLRM
506
1
0
06 May 2025
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Yiran Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
Haoyang Li
LLMAG
129
1
0
06 May 2025
Previous
123...131415...252627
Next