Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Z. Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Jianxin Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 788 papers shown
Title
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
92
3
0
30 Apr 2025
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
Md Fahim Anjum
LRM
34
0
0
30 Apr 2025
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
Hanjun Luo
Haiying He
Yucheng Wang
Jinluan Yang
Rui Liu
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
Li Shen
LRM
31
1
0
30 Apr 2025
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
Siqi Li
Yufan Shen
Xiangnan Chen
Jiayi Chen
Hengwei Ju
...
Licheng Wen
Botian Shi
Y. Liu
Xinyu Cai
Yu Qiao
VLM
ELM
96
0
0
30 Apr 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Yuxiao Chen
Qingyun Wu
49
1
0
30 Apr 2025
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
Sizhe Wang
Zihan Wang
Dongsheng Ma
Yongan Yu
Rui Ling
Zehan Li
Zhiyu Li
Wenbo Zhang
LRM
65
0
0
30 Apr 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLM
OODD
LRM
198
2
0
30 Apr 2025
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
94
1
0
29 Apr 2025
ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models
Jin Xie
Ruishi He
Songze Li
Xiaojun Jia
Shouling Ji
SILM
AAML
68
0
0
29 Apr 2025
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Hasan Hammoud
Hani Itani
Guohao Li
ReLM
LRM
80
1
0
29 Apr 2025
HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation
Cristina Garbacea
Chenhao Tan
55
0
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
131
9
0
29 Apr 2025
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets
Mingqian He
Fei Zhao
Chonggang Lu
Ziqiang Liu
Yishuo Wang
Haofu Qian
OffRL
AI4TS
VLM
72
0
0
28 Apr 2025
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning
Peijian Zeng
Feiyan Pang
Zhanbo Wang
Aimin Yang
74
0
0
28 Apr 2025
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping
Feng Chen
Ilias Stogiannidis
Andrew Wood
Danilo Bueno
Dominic Williams
...
Stephen A. Rolfe
Tracy Lawson
Tony Pridmore
M. Giuffrida
Sotirios A. Tsaftaris
62
0
0
28 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
Shri Kiran Srinivasan
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
101
0
0
28 Apr 2025
Mitigating Societal Cognitive Overload in the Age of AI: Challenges and Directions
Salem Lahlou
62
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
91
0
0
27 Apr 2025
GenTorrent: Scaling Large Language Model Serving with An Overley Network
Fei Fang
Yifan Hua
Shengze Wang
Ruilin Zhou
Y. Liu
Chen Qian
Jiahui Geng
63
0
0
27 Apr 2025
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning
Abdellah Ghassel
Xianzhi Li
Xiaodan Zhu
51
0
0
26 Apr 2025
LawFlow : Collecting and Simulating Lawyers' Thought Processes
Debarati Das
Khanh Chi Le
R. Parkar
Karin de Langis
Brendan Madson
...
Robin M. Willis
Daniel H. Moses
Brett McDonnell
Daniel Schwarcz
Dongyeop Kang
AILaw
223
0
0
26 Apr 2025
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
49
0
0
25 Apr 2025
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
Shaokun Zhang
Yi Dong
Jieyu Zhang
Jan Kautz
Bryan Catanzaro
Andrew Tao
Qingyun Wu
Zhiding Yu
Guilin Liu
LLMAG
OffRL
KELM
LRM
91
0
0
25 Apr 2025
Efficient Single-Pass Training for Multi-Turn Reasoning
Ritesh Goru
Shanay Mehta
Prateek Jain
LRM
32
0
0
25 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
94
0
0
25 Apr 2025
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family
Pierre-Carl Langlais
Pavel Chizhov
Mattia Nee
Carlos Rosas Hinostroza
Matthieu Delsart
Irène Girard
Othman Hicheur
Anastasia Stasenko
Ivan P. Yamshchikov
LRM
68
0
0
25 Apr 2025
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Yufei Wang
Pei Zhang
Jialong Tang
Haoran Wei
Baosong Yang
...
Wenjie Qu
Fei Huang
Junyang Lin
Fei Huang
Jingren Zhou
LRM
57
1
0
25 Apr 2025
AI Awareness
Xianrui Li
Haoyuan Shi
Rongwu Xu
Wei Xu
59
0
0
25 Apr 2025
Dargana: fine-tuning EarthPT for dynamic tree canopy mapping from space
Michael J. Smith
Luke Fleming
James E. Geach
Ryan J. Roberts
Freddie Kalaitzis
James Banister
29
0
0
24 Apr 2025
The Role of Open-Source LLMs in Shaping the Future of GeoAI
Xiao Shi Huang
Zhengzhong Tu
X. Ye
Michael Goodchild
50
0
0
24 Apr 2025
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
Ivan Moshkov
Darragh Hanley
Ivan Sorokin
Shubham Toshniwal
Christof Henkel
Benedikt Schifferer
Wei Du
Igor Gitman
ReLM
LRM
45
3
0
23 Apr 2025
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
2
0
23 Apr 2025
Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study
Mohammad Khodadad
Ali Shiraee Kasmaee
Mahdi Astaraki
Nicholas Sherck
H. Mahyar
Soheila Samiee
LRM
196
0
0
23 Apr 2025
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Balaji Rao
William Eiers
Carlo Lipizzi
37
0
0
23 Apr 2025
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris
Yichen Wei
Yi Peng
Xuben Wang
Weijie Qiu
...
Jianhao Zhang
Y. Hao
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
AI4TS
SyDa
LRM
VLM
79
0
0
23 Apr 2025
ZipR1: Reinforcing Token Sparsity in MLLMs
Feng Chen
Yefei He
Lequan Lin
Qingbin Liu
Bohan Zhuang
Qi Wu
51
0
0
23 Apr 2025
Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Josefa Lia Stoisser
Marc Boubnovski Martell
Julien Fauqueur
LMTD
ReLM
AI4TS
LRM
98
0
0
23 Apr 2025
Param
Δ
Δ
Δ
for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
85
0
0
23 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David Evans
LLMSV
78
1
0
23 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
61
0
0
23 Apr 2025
Compass-V2 Technical Report
Sophia Maria
MoE
LRM
41
0
0
22 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
32
0
0
22 Apr 2025
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation
Chanyeol Choi
Jihoon Kwon
Jaeseon Ha
Hojun Choi
Chaewoon Kim
Yongjae Lee
Jy-yong Sohn
Alejandro Lopez-Lira
RALM
61
0
0
22 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
51
1
0
22 Apr 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo
Liangbing Zhao
Sayak Paul
Yue Liao
Renrui Zhang
Yi Xin
Peng Gao
Mohamed Elhoseiny
Yiming Li
VLM
75
0
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
168
5
0
22 Apr 2025
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
Cheng Wen
Tingwei Guo
Shuaijiang Zhao
Wei Zou
Xiangang Li
OffRL
AuLLM
LRM
62
3
0
22 Apr 2025
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang
Julian Asilis
Ömer Faruk Akgül
Enes Burak Bilgin
Ollie Liu
Willie Neiswanger
OffRL
LRM
41
3
0
22 Apr 2025
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng
Xinyi Wu
Yongxing Yang
Congchao Zhu
Song Wang
Zhenyao Wu
45
0
0
21 Apr 2025
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
Yuan-Hong Liao
Sven Elflein
Liu He
Laura Leal-Taixe
Yejin Choi
Sanja Fidler
David Acuna
ReLM
LRM
VLM
183
0
0
21 Apr 2025
Previous
1
2
3
4
5
6
...
14
15
16
Next