ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
Graph-based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey
Graph-based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey
Zulun Zhu
Tiancheng Huang
Kai Wang
Junda Ye
Xiao Chen
Siqiang Luo
3DV
152
0
0
08 Apr 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang
Qi Wang
Xingguang Ji
Yang Liu
Yang Yue
Fuzheng Zhang
Di Zhang
Guorui Zhou
Kun Gai
LRM
119
7
0
08 Apr 2025
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
184
1
0
08 Apr 2025
SkillFlow: Efficient Skill and Code Transfer Through Communication in Adapting AI Agents
SkillFlow: Efficient Skill and Code Transfer Through Communication in Adapting AI Agents
Pagkratios Tagkopoulos
Fangzhou Li
I. Tagkopoulos
50
0
0
08 Apr 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
Pengfei Zhou
Fanrui Zhang
Xiaopeng Peng
Zhaopan Xu
Jiaxin Ai
...
Kai Wang
Xiaojun Chang
Wenqi Shao
Yang You
Jianchao Tan
ELMLRM
113
3
0
08 Apr 2025
Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness
Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness
Dongzhuoran Zhou
Yuqicheng Zhu
Yuan He
Jiaoyan Chen
Evgeny Kharlamov
Steffen Staab
RALM
135
1
0
07 Apr 2025
Concise Reasoning via Reinforcement Learning
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLMOffRLLRM
145
17
0
07 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
147
0
0
07 Apr 2025
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Zonghang Li
Tao Li
Wenjiao Feng
Mohsen Guizani
Hongfang Yu
34
0
0
07 Apr 2025
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
Dong Won Lee
Y. Kim
Denison Guvenoz
Sooyeon Jeong
Parker Malachowsky
Louis-Philippe Morency
C. Breazeal
Hae Won Park
95
0
0
07 Apr 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQLRM
134
11
0
07 Apr 2025
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen
Zhenyu Zhang
Junyuan Hong
Souvik Kundu
Zhangyang Wang
OffRLLRM
164
14
0
07 Apr 2025
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Rem Yang
Julian Dai
N. Vasilakis
Martin Rinard
ELMLRM
52
1
0
07 Apr 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
Emre Can Acikgoz
Cheng Qian
Hongru Wang
Vardhan Dongre
Xiusi Chen
Heng Ji
Dilek Hakkani-Tur
Gokhan Tur
LM&RoELM
218
1
0
07 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
107
16
0
07 Apr 2025
scAgent: Universal Single-Cell Annotation via a LLM Agent
scAgent: Universal Single-Cell Annotation via a LLM Agent
Yuren Mao
Yu Mi
Peigen Liu
Mengfei Zhang
Hanqing Liu
Yunjun Gao
LLMAG
60
2
0
07 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
120
2
0
07 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDaOffRLReLMLRM
198
11
0
07 Apr 2025
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue
Yufeng Yuan
Qiying Yu
Xiaochen Zuo
Ruofei Zhu
...
Ru Zhang
Xin Liu
Mingxuan Wang
Yonghui Wu
Lin Yan
OffRLLRM
154
39
0
07 Apr 2025
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu
W. Shi
Yuchen Zhuang
Yue Yu
Joyce C. Ho
Haoyu Wang
Carl Yang
79
3
0
07 Apr 2025
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
Adrián Bazaga
Rexhina Blloshmi
Bill Byrne
Adria de Gispert
ReLMLRM
109
1
0
07 Apr 2025
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization
Weiwei Sun
Shengyu Feng
Shanda Li
Yiming Yang
LLMAG
99
5
0
06 Apr 2025
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
109
5
0
06 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Yuheng Wu
Wentao Guo
Zirui Liu
Heng Ji
Zhaozhuo Xu
Denghui Zhang
84
0
0
05 Apr 2025
Rethinking Reflection in Pre-Training
Rethinking Reflection in Pre-Training
Essential AI
Darsh J Shah
Peter Rushton
Somanshu Singla
Mohit Parmar
...
Philip Monk
Platon Mazarakis
Ritvik Kapila
Saurabh Srivastava
Tim Romanski
ReLMLRM
187
14
0
05 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
103
2
0
05 Apr 2025
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Kate Sanders
Benjamin Van Durme
LRM
155
1
0
04 Apr 2025
Learning Lie Group Generators from Trajectories
Learning Lie Group Generators from Trajectories
Lifan Hu
151
9
0
04 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
93
4
0
04 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELMLRM
97
3
0
04 Apr 2025
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition
Rishi Hazra
Gabriele Venturato
Pedro Zuidberg Dos Martires
Luc de Raedt
ReLMLRM
119
2
0
04 Apr 2025
Towards Effective EU E-Participation: The Development of AskThePublic
Towards Effective EU E-Participation: The Development of AskThePublic
Kilian Sprenkamp
Nils Messerschmidt
Amir Sartipi
Igor Tchappi
Xiaohui Wu
L. Zavolokina
Gilbert Fridgen
67
0
0
04 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
106
6
0
04 Apr 2025
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
Hung Le
Dai Do
D. Nguyen
Svetha Venkatesh
OffRLLRM
84
1
0
03 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
123
0
0
03 Apr 2025
AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology
AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology
Xiang Feng
Wentao Jiang
Zengmao Wang
Yong Luo
Pingbo Xu
Baosheng Yu
Hua Jin
Bo Du
Jing Zhang
ELMLRM
86
0
0
03 Apr 2025
Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
Anita Rau
Mark Endo
Josiah Aklilu
Jaewoo Heo
Khaled Saab
Alberto Paderno
Jeffrey Jopling
F. C. Holsinger
Serena Yeung-Levy
101
1
0
03 Apr 2025
Affordable AI Assistants with Knowledge Graph of Thoughts
Affordable AI Assistants with Knowledge Graph of Thoughts
Maciej Besta
Lorenzo Paleari
Jia Hao Andrea Jiang
Robert Gerstenberger
You Wu
...
Torsten Hoefler
Grzegorz Kwa'sniewski
Marcin Copik
H. Niewiadomski
Torsten Hoefler
LLMAGRALM
525
0
0
03 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
115
3
0
03 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRLLRM
145
9
0
03 Apr 2025
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
Andres Algaba
Vincent Holst
Floriano Tori
Melika Mobini
Brecht Verbeken
Sylvia Wenmackers
Vincent Ginis
128
1
0
03 Apr 2025
Generative Evaluation of Complex Reasoning in Large Language Models
Generative Evaluation of Complex Reasoning in Large Language Models
Haowei Lin
Xiang Wang
Ruilin Yan
Baizhou Huang
Haotian Ye
Jianhua Zhu
Zihao Wang
James Zou
Jianzhu Ma
Yitao Liang
ReLMELMLRM
449
0
0
03 Apr 2025
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Mohammad Reza Ghasemi Madani
Aryo Pradipta Gema
Gabriele Sarti
Yu Zhao
Pasquale Minervini
Andrea Passerini
AAML
121
1
0
03 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
130
4
0
03 Apr 2025
Understanding Aha Moments: from External Observations to Internal Mechanisms
Understanding Aha Moments: from External Observations to Internal Mechanisms
Shu Yang
Junchao Wu
Xin Chen
Yunze Xiao
Xinyi Yang
Derek F. Wong
Di Wang
LRM
81
10
0
03 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRLLRM
217
54
0
03 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
171
5
0
03 Apr 2025
MegaMath: Pushing the Limits of Open Math Corpora
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou
Zengzhi Wang
Nikhil Ranjan
Zhoujun Cheng
Liping Tang
Guowei He
Zhengzhong Liu
Eric P. Xing
LRM
151
3
0
03 Apr 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
L. Zhang
...
Jing Su
Tianyu Liu
Rui Long
Kai Shen
Liang Xiang
115
7
0
03 Apr 2025
Statics of continuum planar grasping
Statics of continuum planar grasping
Udit Halder
44
0
0
03 Apr 2025
Previous
123...181920...252627
Next