ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLM
    VLM
    OffRL
    AI4TS
    LRM
ArXivPDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 830 papers shown
Title
OTC: Optimal Tool Calls via Reinforcement Learning
OTC: Optimal Tool Calls via Reinforcement Learning
Hongru Wang
Cheng Qian
Wanjun Zhong
Xiusi Chen
Jiahao Qiu
Shijue Huang
Bowen Jin
Mengdi Wang
Kam-Fai Wong
Heng Ji
OffRL
LRM
50
3
0
21 Apr 2025
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan
Chen Henry Wu
Charles Ding
Aditi Raghunathan
50
0
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
1
0
21 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
84
0
0
21 Apr 2025
Towards Understanding Camera Motions in Any Video
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin
Siyuan Cen
Daniel Jiang
Jay Karhade
Hewei Wang
...
Rushikesh Zawar
Xue Bai
Yilun Du
Chuang Gan
Deva Ramanan
VGen
39
0
0
21 Apr 2025
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey
Ahsan Bilal
Muhammad Ahmed Mohsin
Muhammad Umer
Muhammad Awais Khan Bangash
Muhammad Ali Jamshed
LLMAG
LRM
AI4CE
61
0
0
20 Apr 2025
PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind
Yajie Yu
Yue Feng
LLMAG
38
0
0
20 Apr 2025
ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning Model
ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning Model
Haiyan Qin
Zhiwei Xie
Jingjing Li
Liangchen Li
Xiaotong Feng
Jing Liu
Wang Kang
OffRL
LRM
242
0
0
20 Apr 2025
Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension
Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension
Lin Li
Wei Chen
Jiahui Li
Lu Chen
Long Chen
LRM
57
1
0
20 Apr 2025
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
Minh V.T. Pham
Huy N. Phan
Hoang N. Phan
Cuong Le Chi
T. Nguyen
Nghi D. Q. Bui
SyDa
38
0
0
20 Apr 2025
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
Yunhui Xia
Wei Shen
Yan Wang
Jason Klein Liu
Huifeng Sun
Siyue Wu
Jian Hu
Xiaolong Xu
AI4TS
35
1
0
20 Apr 2025
Towards Optimal Circuit Generation: Multi-Agent Collaboration Meets Collective Intelligence
Towards Optimal Circuit Generation: Multi-Agent Collaboration Meets Collective Intelligence
Haiyan Qin
Jiahao Feng
Xiaotong Feng
Wei W. Xing
Wang Kang
48
0
0
20 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
41
0
0
19 Apr 2025
CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Man Ho Adrian Lam
Chaozheng Wang
Jen-tse Huang
M. Lyu
LRM
41
0
0
19 Apr 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
X. Zhang
Rongxiang Weng
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
43
6
0
19 Apr 2025
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang
Zhuoqun Hao
Y. Cho
B. Li
Yuan Yuan
Sihao Chen
Lyle Ungar
Camillo J Taylor
Dan Roth
49
1
0
19 Apr 2025
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
37
0
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
63
23
0
18 Apr 2025
Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization
Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization
Hongwei Ji
Wulian Yun
Mengshi Qi
Huadong Ma
LRM
242
0
0
18 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
49
2
0
18 Apr 2025
Compile Scene Graphs with Reinforcement Learning
Compile Scene Graphs with Reinforcement Learning
Zuyao Chen
Jinlin Wu
Zhen Lei
Marc Pollefeys
Chang Wen Chen
OffRL
LRM
67
1
0
18 Apr 2025
LangCoop: Collaborative Driving with Language
LangCoop: Collaborative Driving with Language
Xiangbo Gao
Yuheng Wu
Rujia Wang
Chenxi Liu
Yang Zhou
Zhengzhong Tu
VLM
48
0
0
18 Apr 2025
Learning to Attribute with Attention
Learning to Attribute with Attention
Benjamin Cohen-Wang
Yung-Sung Chuang
Aleksander Madry
37
0
0
18 Apr 2025
Can Machine Learning Agents Deal with Hard Choices?
Can Machine Learning Agents Deal with Hard Choices?
Kangyu Wang
269
0
0
18 Apr 2025
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Enhao Huang
Rainy Sun
Anya Reese
Alex Chen
Alex Chen
...
Gang Zhao
Garry Zhao
Frank Li
Hobert Wong
Lowes Yang
ALM
ELM
77
0
0
18 Apr 2025
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang
Shang Zhu
Jon Saad-Falcon
Ben Athiwaratkun
Qingyang Wu
Jue Wang
Shuaiwen Leon Song
Ce Zhang
Bhuwan Dhingra
James Y. Zou
LRM
57
1
0
18 Apr 2025
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
Menglan Chen
Xianghe Pang
Jingjing Dong
Wenhao Wang
Yaxin Du
Siheng Chen
LRM
44
0
0
17 Apr 2025
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Mehmet Hamza Erol
Batu El
Mirac Suzgun
Mert Yuksekgonul
J. Zou
ELM
45
0
0
17 Apr 2025
SkyReels-V2: Infinite-length Film Generative Model
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen
D. Lin
Jiangping Yang
Chunze Lin
J. Zhu
...
Di Qiu
Debang Li
Zhengcong Fei
Yang Li
Yahui Zhou
DiffM
VGen
61
2
0
17 Apr 2025
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Xinsong Zhang
Yarong Zeng
Xinting Huang
Hu Hu
Runquan Xie
Han Hu
Zhanhui Kang
MLLM
VLM
59
1
0
17 Apr 2025
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Zhouyang Jiang
Bin Zhang
Airong Wei
Zhiwei Xu
OffRL
44
0
0
17 Apr 2025
Antidistillation Sampling
Antidistillation Sampling
Yash Savani
Asher Trockman
Zhili Feng
Avi Schwarzschild
Alexander Robey
Marc Finzi
J. Zico Kolter
46
0
0
17 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
241
2
0
17 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
Xinyu Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
39
0
0
17 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
58
2
0
16 Apr 2025
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Zhongxi Qiu
Zhang Zhang
Yan Hu
Heng Li
Jiang-Dong Liu
OffRL
242
0
0
16 Apr 2025
Evaluating the Diversity and Quality of LLM Generated Content
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula
Shuo Li
Botong Zhang
Vishakh Padmakumar
Kayo Yin
Osbert Bastani
61
2
0
16 Apr 2025
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Yiyou Sun
Georgia Zhou
Haoran Wang
Dexun Li
Nouha Dziri
Dawn Song
ReLM
ALM
ELM
LRM
80
2
1
16 Apr 2025
Xpose: Bi-directional Engineering for Hidden Query Extraction
Xpose: Bi-directional Engineering for Hidden Query Extraction
Ahana Pradhan
Jayant Haritsa
24
0
0
15 Apr 2025
A comprehensive review of remote sensing in wetland classification and mapping
A comprehensive review of remote sensing in wetland classification and mapping
Shuai Yuan
Xiangan Liang
Tianwu Lin
Shuang Chen
Rui Liu
Jie Wang
Huatian Zhang
Peng Gong
36
0
0
15 Apr 2025
Deep Learning in Concealed Dense Prediction
Deep Learning in Concealed Dense Prediction
Pancheng Zhao
Deng-Ping Fan
Shupeng Cheng
Salman Khan
Fahad Shahbaz Khan
David Clifton
Peng Xu
Jufeng Yang
VLM
39
0
0
15 Apr 2025
Teaching Large Language Models to Reason through Learning and Forgetting
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
243
0
0
15 Apr 2025
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation
Linus Jern
Valter Uotila
Cong Yu
Bo Zhao
MQ
LRM
35
0
0
15 Apr 2025
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
Jiseon Kim
Jea Kwon
L. Vecchietti
Alice Oh
Meeyoung Cha
34
0
0
15 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
270
1
1
15 Apr 2025
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
33
0
0
15 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
204
5
0
15 Apr 2025
TextArena
TextArena
Leon Guertler
Bobby Cheng
Simon Yu
Bo Liu
Leshem Choshen
Cheston Tan
LLMAG
45
1
0
15 Apr 2025
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
Ruicheng Ao
Gan Luo
D. Simchi-Levi
Xinshang Wang
39
2
0
15 Apr 2025
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
Nicolas Baumann
Cheng Hu
Paviththiren Sivasothilingam
Haotong Qin
Lei Xie
Michele Magno
Luca Benini
37
1
0
15 Apr 2025
Previous
123...678...151617
Next