Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
Xuying Li
Zhuo Li
Yuji Kosuga
Victor Bian
AAML
LRM
108
4
0
02 Mar 2025
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Toby Simonds
Akira Yoshiyama
LRM
124
6
0
02 Mar 2025
Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
91
1
0
01 Mar 2025
An evaluation of DeepSeek Models in Biomedical Natural Language Processing
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Jiawen Deng
Yu Hou
Jeremy Yeung
Rui Zhang
ELM
89
1
0
01 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
145
22
0
01 Mar 2025
Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth
Seyed Pouyan Mousavi Davoudi
Alireza Shafiee Fard
Alireza Amiri-Margavi
Mahdi Jafari
LRM
128
0
0
28 Feb 2025
À la recherche du sens perdu: your favourite LLM might have more to say than you can understand
K. O. T. Erziev
98
0
0
28 Feb 2025
Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs
Xiaomin Li
Zhou Yu
Ziji Zhang
Yingying Zhuang
Siyang Song
Narayanan Sadagopan
Anurag Beniwal
HILM
130
1
0
28 Feb 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
144
0
0
28 Feb 2025
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Pedram Bakhtiarifard
Pınar Tözün
Christian Igel
Raghavendra Selvan
127
0
0
27 Feb 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
LRM
RALM
124
0
0
27 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
130
1
0
27 Feb 2025
Deterministic or probabilistic? The psychology of LLMs as random number generators
Javier Coronado-Blázquez
91
1
0
27 Feb 2025
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Minggui He
Yilun Liu
Shimin Tao
Yuanchang Luo
Hongyong Zeng
...
Daimeng Wei
Weibin Meng
Hao Yang
Boxing Chen
Osamu Yoshie
LRM
171
8
0
27 Feb 2025
Can Textual Gradient Work in Federated Learning?
Minghui Chen
Ruinan Jin
Wenlong Deng
Yuanyuan Chen
Zhi Huang
Han Yu
Xiaoxiao Li
FedML
184
6
0
27 Feb 2025
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Toru Lin
Kartik Sachdev
Linxi Fan
Jitendra Malik
Yuke Zhu
130
11
0
27 Feb 2025
NeoBERT: A Next-Generation BERT
Lola Le Breton
Quentin Fournier
Mariam El Mezouar
John X. Morris
Sarath Chandar
AI4TS
145
1
0
26 Feb 2025
General Intelligence Requires Reward-based Pretraining
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
384
2
0
26 Feb 2025
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
Longling Geng
Edward Y. Chang
LLMAG
130
4
0
26 Feb 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
311
13
0
26 Feb 2025
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
Humza Sami
Mubashir ul Islam
Samy Charas
Asav Gandhi
P. Gaillardon
V. Tenace
LLMAG
142
2
0
26 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jing Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
177
17
0
26 Feb 2025
TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
Tong Wu
Junzhe Shen
Zixia Jia
Yanjie Wang
Zilong Zheng
129
1
0
26 Feb 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
161
42
0
26 Feb 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Ameya Prabhu
ReLM
ELM
LRM
225
0
0
26 Feb 2025
DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning
Pusheng Xu
Yue Wu
Kai Jin
Xiaolan Chen
M. He
Danli Shi
ELM
VLM
LRM
90
2
0
25 Feb 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
G. Wang
Minyu Gao
Shuai Yang
Ya Zhang
Lizhi He
...
Yexuan Zhang
Wanyue Li
Lu Chen
Jintao Fei
Xin Li
423
2
0
25 Feb 2025
PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback
Nils Wandel
David Stotko
Alexander Schier
Reinhard Klein
77
0
0
25 Feb 2025
Rank1: Test-Time Compute for Reranking in Information Retrieval
Orion Weller
Kathryn Ricci
Eugene Yang
Andrew Yates
Dawn J Lawrie
Benjamin Van Durme
ReLM
AI4TS
LRM
184
12
0
25 Feb 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei
LRM
115
50
0
25 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
104
17
0
24 Feb 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRL
LRM
AI4CE
159
22
0
24 Feb 2025
Spontaneous Giving and Calculated Greed in Language Models
Yuxuan Li
Hirokazu Shirado
ReLM
LRM
AI4CE
120
2
0
24 Feb 2025
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
Boxuan Zhang
Ruqi Zhang
LRM
76
3
0
24 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
191
2
0
24 Feb 2025
AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay
Ziyi Tang
Zhenpeng Chen
Jiarui Yang
Jiayao Mai
Yongsen Zheng
Keze Wang
Jinrui Chen
Liang Lin
AIFin
112
2
0
24 Feb 2025
Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications
Feng Ma
Xiang Wang
Chen Chen
Xiao-bin Xu
Xin-ping Yan
474
0
0
23 Feb 2025
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light
Wei Cheng
Benjamin Rivière
Wu Yue
Masafumi Oyamada
Mengdi Wang
Yisong Yue
Santiago Paternain
Haifeng Chen
ReLM
LRM
129
4
0
23 Feb 2025
CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
Chenlong Wang
Zhaoyang Chu
Zhengxiang Cheng
Xuyi Yang
Kaiyue Qiu
Yao Wan
Zhou Zhao
Xuanhua Shi
Benlin Liu
ALM
SyDa
105
0
0
23 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
101
3
0
23 Feb 2025
Analyzing User Perceptions of Large Language Models (LLMs) on Reddit: Sentiment and Topic Modeling of ChatGPT and DeepSeek Discussions
Krishnaveni Katta
48
1
0
22 Feb 2025
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals
Linda Zeng
Rithwik Gupta
Divij Motwani
Diji Yang
Yi Zhang
AAML
173
3
0
22 Feb 2025
Enhancing Domain-Specific Retrieval-Augmented Generation: Synthetic Data Generation and Evaluation using Reasoning Models
Aryan Jadon
Avinash Patil
Shashank Kumar
SyDa
91
1
0
21 Feb 2025
A Cautionary Tale About "Neutrally" Informative AI Tools Ahead of the 2025 Federal Elections in Germany
Ina Dormuth
Sven Franke
Marlies Hafer
Tim Katzke
Alexander Marx
Emmanuel Müller
Daniel Neider
Markus Pauly
Jérôme Rutinowski
118
1
0
21 Feb 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Michael Luo
Xiaoxiang Shi
Colin Cai
Tianjun Zhang
Justin Wong
...
Chi Wang
Yanping Huang
Zhifeng Chen
Joseph E. Gonzalez
Ion Stoica
110
4
0
20 Feb 2025
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Daniel J.H. Chung
Zhiqi Gao
Yurii Kvasiuk
Tianyi Li
Moritz Münchmeyer
Maja Rudolph
Frederic Sala
Sai Chaitanya Tadepalli
AIMat
101
7
0
19 Feb 2025
Inference of Abstraction for Grounded Predicate Logic
Hiroyuki Kido
NAI
51
0
0
19 Feb 2025
MatterChat: A Multi-Modal LLM for Material Science
Yingheng Tang
Wenbin Xu
Jie Cao
Jianzhu Ma
Weilu Gao
Steve Farrell
Benjamin Erichson
Michael W. Mahoney
Andy Nonaka
204
8
0
18 Feb 2025
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
Xinyi Yang
Liang Zeng
Heng Dong
Chao Yu
Xiaojun Wu
H. Yang
Yu Wang
Milind Tambe
Tonghan Wang
145
4
0
18 Feb 2025
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Hao Gao
Shaoyu Chen
Bo Jiang
Bencheng Liao
Yiang Shi
...
Xinbang Zhang
Y. Zhang
Wenyu Liu
Qian Zhang
Xinggang Wang
182
10
0
18 Feb 2025
Previous
1
2
3
...
24
25
26
27
Next