Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 830 papers shown
Title
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Ziqiang Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
87
9
0
04 Mar 2025
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models
Joykirat Singh
Tanmoy Chakraborty
A. Nambi
AI4Cl
LRM
ReLM
60
1
0
04 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Paul Janson
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
83
0
0
04 Mar 2025
Learning from Failures in Multi-Attempt Reinforcement Learning
Stephen Chung
Wenyu Du
Jie Fu
LRM
42
1
0
04 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Yuhang Cao
Haodong Duan
Dahua Lin
Jiaqi Wang
ObjD
VLM
LRM
80
49
0
03 Mar 2025
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
Keliang Li
Tianhua Zhang
Yunxiang Li
Hongyin Luo
Abdalla Moustafa
Xixin Wu
James Glass
Helen Meng
68
0
0
03 Mar 2025
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Yao Wang
Mingxuan Cui
Arthur Jiang
77
0
0
03 Mar 2025
Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks
J. N. Kreikemeyer
Miłosz Jankowski
Pia Wilsdorf
A. Uhrmacher
79
0
0
03 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
93
39
0
03 Mar 2025
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu
Yongcheng Jing
Yingjie Wang
Wenbin Hu
Dacheng Tao
RALM
LRM
76
2
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Wenjie Qu
Xiren Zhou
MoE
SyDa
78
32
0
03 Mar 2025
Adaptively profiling models with task elicitation
Davis Brown
Prithvi Balehannina
Helen Jin
Shreya Havaldar
Hamed Hassani
Eric Wong
ALM
ELM
114
0
0
03 Mar 2025
Comparative Analysis of OpenAI GPT-4o and DeepSeek R1 for Scientific Text Categorization Using Prompt Engineering
A. Maiti
Samuel Adewumi
Temesgen Alemayehu Tikure
Zichun Wang
Niladri Sengupta
Anastasiia Sukhanova
Ananya Jana
ELM
VLM
52
1
0
03 Mar 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Prapti Trivedi
Vikas Yadav
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudan
James Zou
Nazneen Rajani
AAML
LRM
58
2
0
03 Mar 2025
What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret
Yufeng Yuan
Yu Yue
Ruofei Zhu
Tiantian Fan
Lin Yan
OffRL
67
13
0
03 Mar 2025
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar
Enzhuo Zhang
Zhenshi Li
Feng-Xue Gu
Yanglangxing He
Pengfeng Xiao
Xueliang Zhang
58
3
0
02 Mar 2025
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
Xuying Li
Zhuo Li
Yuji Kosuga
Victor Bian
AAML
LRM
69
4
0
02 Mar 2025
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Toby Simonds
Akira Yoshiyama
LRM
48
3
0
02 Mar 2025
Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
39
1
0
01 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
58
13
0
01 Mar 2025
An evaluation of DeepSeek Models in Biomedical Natural Language Processing
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Jiawen Deng
Yu Hou
Jeremy Yeung
Rui Zhang
ELM
59
0
0
01 Mar 2025
Collective Reasoning Among LLMs A Framework for Answer Validation Without Ground Truth
Seyed Pouyan Mousavi Davoudi
Alireza Shafiee Fard
Alireza Amiri-Margavi
LRM
64
0
0
28 Feb 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
46
0
0
28 Feb 2025
À la recherche du sens perdu: your favourite LLM might have more to say than you can understand
K. O. T. Erziev
46
0
0
28 Feb 2025
Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs
Xiaomin Li
Zhou Yu
Ziji Zhang
Yingying Zhuang
Shri Kiran Srinivasan
Narayanan Sadagopan
Anurag Beniwal
HILM
65
0
0
28 Feb 2025
Can Textual Gradient Work in Federated Learning?
Minghui Chen
Ruinan Jin
Wenlong Deng
Yuanyuan Chen
Zhi Huang
Han Yu
Xiaoxiao Li
FedML
94
2
0
27 Feb 2025
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Toru Lin
Kartik Sachdev
Linxi Fan
Jitendra Malik
Yuke Zhu
54
8
0
27 Feb 2025
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Pedram Bakhtiarifard
Pınar Tözün
Christian Igel
Raghavendra Selvan
62
0
0
27 Feb 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
RALM
LRM
69
0
0
27 Feb 2025
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Minggui He
Yilun Liu
Shimin Tao
Yuanchang Luo
Hongyong Zeng
...
Daimeng Wei
Weibin Meng
Hao Yang
Boxing Chen
Osamu Yoshie
LRM
73
3
0
27 Feb 2025
Deterministic or probabilistic? The psychology of LLMs as random number generators
Javier Coronado-Blázquez
50
1
0
27 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
61
1
0
27 Feb 2025
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
Longling Geng
Edward Y. Chang
LLMAG
79
1
0
26 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jing Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
86
9
0
26 Feb 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
122
8
0
26 Feb 2025
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
Tong Wu
Junzhe Shen
Zixia Jia
Yuanda Wang
Zilong Zheng
85
0
0
26 Feb 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
75
15
0
26 Feb 2025
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
Humza Sami
Mubashir ul Islam
Samy Charas
Asav Gandhi
P. Gaillardon
V. Tenace
LLMAG
78
0
0
26 Feb 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Ameya Prabhu
ReLM
ELM
LRM
131
0
0
26 Feb 2025
General Reasoning Requires Learning to Reason from the Get-go
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
199
1
0
26 Feb 2025
NeoBERT: A Next-Generation BERT
Lola Le Breton
Quentin Fournier
Mariam El Mezouar
Sarath Chandar
AI4TS
82
1
0
26 Feb 2025
Rank1: Test-Time Compute for Reranking in Information Retrieval
Orion Weller
Kathryn Ricci
Eugene Yang
Andrew Yates
Dawn J Lawrie
Benjamin Van Durme
ReLM
AI4TS
LRM
130
5
0
25 Feb 2025
PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback
Nils Wandel
David Stotko
Alexander Schier
Reinhard Klein
38
0
0
25 Feb 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
G. Wang
Minyu Gao
Shuai Yang
Ya Zhang
Lizhi He
...
Yexuan Zhang
Wanyue Li
Lu Chen
Jintao Fei
Xin Li
203
1
0
25 Feb 2025
DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning
Pusheng Xu
Yue Wu
Kai Jin
Xiaolan Chen
M. He
Danli Shi
ELM
VLM
LRM
60
1
0
25 Feb 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei
LRM
53
25
0
25 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
44
7
0
24 Feb 2025
AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay
Ziyi Tang
Zhenpeng Chen
Jiarui Yang
Jiayao Mai
Yongsen Zheng
Keze Wang
Jinrui Chen
Liang Lin
AIFin
58
2
0
24 Feb 2025
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
Boxuan Zhang
Ruqi Zhang
LRM
37
2
0
24 Feb 2025
Spontaneous Giving and Calculated Greed in Language Models
Yuxuan Li
Hirokazu Shirado
ReLM
LRM
AI4CE
48
1
0
24 Feb 2025
Previous
1
2
3
...
14
15
16
17
Next