ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12948
  4. Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
    ReLMVLMOffRLAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"

50 / 1,327 papers shown
Title
ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning Model
ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning Model
Haiyan Qin
Zhiwei Xie
Jingjing Li
Liangchen Li
Xiaotong Feng
Jing Liu
Wang Kang
OffRLLRM
467
1
0
20 Apr 2025
Towards Optimal Circuit Generation: Multi-Agent Collaboration Meets Collective Intelligence
Towards Optimal Circuit Generation: Multi-Agent Collaboration Meets Collective Intelligence
Haiyan Qin
Jiahao Feng
Xiaotong Feng
Wei W. Xing
Wang Kang
125
0
0
20 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
195
1
0
19 Apr 2025
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Man Ho Lam
Chaozheng Wang
Jen-tse Huang
Michael R. Lyu
LRM
112
1
0
19 Apr 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
Xinyu Zhang
Jiadong Wang
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
178
13
0
19 Apr 2025
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang
Zhuoqun Hao
Y. Cho
B. Li
Yuan Yuan
Sihao Chen
Lyle Ungar
Camillo J Taylor
Dan Roth
137
3
0
19 Apr 2025
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRLLRM
91
2
0
19 Apr 2025
Learning to Attribute with Attention
Learning to Attribute with Attention
Benjamin Cohen-Wang
Yung-Sung Chuang
Aleksander Madry
66
0
0
18 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLMLRM
265
128
0
18 Apr 2025
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Enhao Huang
Rainy Sun
Anya Reese
Alex Chen
Alex Chen
...
Gang Zhao
Garry Zhao
Frank Li
Hobert Wong
Lowes Yang
ALMELM
127
0
0
18 Apr 2025
Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization
Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization
Hongwei Ji
Wulian Yun
Mengshi Qi
Huadong Ma
LRM
447
0
0
18 Apr 2025
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang
Shang Zhu
Jon Saad-Falcon
Ben Athiwaratkun
Qingyang Wu
Jue Wang
Shuaiwen Leon Song
Ce Zhang
Bhuwan Dhingra
James Y. Zou
LRM
107
10
0
18 Apr 2025
Can Machine Learning Agents Deal with Hard Choices?
Can Machine Learning Agents Deal with Hard Choices?
Kangyu Wang
345
0
0
18 Apr 2025
LangCoop: Collaborative Driving with Language
LangCoop: Collaborative Driving with Language
Xiangbo Gao
Yuheng Wu
Rujia Wang
Chenxi Liu
Yang Zhou
Zhengzhong Tu
VLM
124
2
0
18 Apr 2025
Compile Scene Graphs with Reinforcement Learning
Compile Scene Graphs with Reinforcement Learning
Zuyao Chen
Jinlin Wu
Zhen Lei
Marc Pollefeys
Chang Wen Chen
OffRLLRM
177
3
0
18 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
123
12
0
18 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRLLRM
501
16
0
17 Apr 2025
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Zhouyang Jiang
Bin Zhang
Airong Wei
Zhiwei Xu
OffRL
150
0
0
17 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
Xinyu Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
114
0
0
17 Apr 2025
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Xinsong Zhang
Yarong Zeng
Xinting Huang
Hu Hu
Runquan Xie
Han Hu
Zhanhui Kang
MLLMVLM
279
2
0
17 Apr 2025
SkyReels-V2: Infinite-length Film Generative Model
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen
D. Lin
Jiangping Yang
Chunze Lin
J. Zhu
...
Di Qiu
Debang Li
Zhengcong Fei
Yang Li
Yahui Zhou
DiffMVGen
132
10
0
17 Apr 2025
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
Menglan Chen
Xianghe Pang
Jingjing Dong
Wenhao Wang
Yaxin Du
Siheng Chen
LRM
126
0
0
17 Apr 2025
Antidistillation Sampling
Antidistillation Sampling
Yash Savani
Asher Trockman
Zhili Feng
Avi Schwarzschild
Alexander Robey
Marc Finzi
J. Zico Kolter
140
3
0
17 Apr 2025
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Mehmet Hamza Erol
Batu El
Mirac Suzgun
Mert Yuksekgonul
J. Zou
ELM
81
1
0
17 Apr 2025
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Zhongxi Qiu
Zhang Zhang
Yan Hu
Heng Li
Jiang-Dong Liu
OffRL
454
1
0
16 Apr 2025
Evaluating the Diversity and Quality of LLM Generated Content
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula
Shuo Li
Botong Zhang
Vishakh Padmakumar
Kayo Yin
Osbert Bastani
107
5
0
16 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffMLRMAI4CE
174
9
0
16 Apr 2025
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Yiyou Sun
Georgia Zhou
Haoran Wang
Dexun Li
Nouha Dziri
Dawn Song
ReLMALMELMLRM
141
6
1
16 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ReLMOffRLLRM
484
4
1
15 Apr 2025
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
Jiseon Kim
Jea Kwon
L. Vecchietti
Alice Oh
Meeyoung Cha
65
1
0
15 Apr 2025
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation
Linus Jern
Valter Uotila
Cong Yu
Bo Zhao
MQLRM
95
0
0
15 Apr 2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Tianyi Zhang
Yang Sui
Shaochen Zhong
Vipin Chaudhary
Helen Zhou
Anshumali Shrivastava
MQ
87
2
0
15 Apr 2025
Xpose: Bi-directional Engineering for Hidden Query Extraction
Xpose: Bi-directional Engineering for Hidden Query Extraction
Ahana Pradhan
Jayant Haritsa
48
0
0
15 Apr 2025
Assessment of Evolving Large Language Models in Upper Secondary Mathematics
Assessment of Evolving Large Language Models in Upper Secondary Mathematics
Mika Setälä
Pieta Sikström
Ville Heilala
T. Karkkainen
ELMLRM
122
1
0
15 Apr 2025
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Haiming Wang
Mert Unsal
Xiaohan Lin
Mantas Baksys
Qingbin Liu
...
Zhouliang Yu
Ziyi Wang
Zhilin Yang
Zhengying Liu
Jia-Nan Li
AIMatReLMAI4TSLRM
158
18
0
15 Apr 2025
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints
Ruicheng Ao
Gan Luo
D. Simchi-Levi
Xinshang Wang
98
2
0
15 Apr 2025
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
82
0
0
15 Apr 2025
Deep Learning in Concealed Dense Prediction
Deep Learning in Concealed Dense Prediction
Pancheng Zhao
Deng-Ping Fan
Shupeng Cheng
Salman Khan
Fahad Shahbaz Khan
David Clifton
Peng Xu
Jufeng Yang
VLM
134
1
0
15 Apr 2025
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
Hengyu Shi
Junhao Su
Huansheng Ning
Xiaoming Wei
Jialin Gao
3DVAI4TSLRM
108
0
0
15 Apr 2025
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
Nicolas Baumann
Cheng Hu
Paviththiren Sivasothilingam
Haotong Qin
Lei Xie
Michele Magno
Luca Benini
92
2
0
15 Apr 2025
A comprehensive review of remote sensing in wetland classification and mapping
A comprehensive review of remote sensing in wetland classification and mapping
Shuai Yuan
Xiangan Liang
Tianwu Lin
Shuang Chen
Rui Liu
Jie Wang
Huatian Zhang
Peng Gong
110
1
0
15 Apr 2025
TextArena
TextArena
Leon Guertler
Bobby Cheng
Simon Yu
Bo Liu
Leshem Choshen
Cheston Tan
LLMAG
136
2
0
15 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLMLRM
435
13
0
15 Apr 2025
Offline Learning and Forgetting for Reasoning with Large Language Models
Offline Learning and Forgetting for Reasoning with Large Language Models
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLMCLLLRM
476
1
0
15 Apr 2025
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Run Luo
Lu Wang
Wanwei He
Xiaobo Xia
LLMAG
201
35
0
14 Apr 2025
Reasoning without Regret
Reasoning without Regret
Tarun Chitra
OffRLLRM
83
0
0
14 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
Kenneth Enevoldsen
Niklas Muennighoff
VLM
159
2
0
14 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yize Zhang
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGenVLM
140
2
0
14 Apr 2025
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
I-Sheng Fang
Jun-Cheng Chen
LRMVLM
172
0
0
14 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAGLRM
441
9
0
14 Apr 2025
Previous
123...161718...252627
Next