Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
Graph-based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey
Zulun Zhu
Tiancheng Huang
Kai Wang
Junda Ye
Xiao Chen
Siqiang Luo
3DV
152
0
0
08 Apr 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang
Qi Wang
Xingguang Ji
Yang Liu
Yang Yue
Fuzheng Zhang
Di Zhang
Guorui Zhou
Kun Gai
LRM
119
7
0
08 Apr 2025
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
184
1
0
08 Apr 2025
SkillFlow: Efficient Skill and Code Transfer Through Communication in Adapting AI Agents
Pagkratios Tagkopoulos
Fangzhou Li
I. Tagkopoulos
50
0
0
08 Apr 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
Pengfei Zhou
Fanrui Zhang
Xiaopeng Peng
Zhaopan Xu
Jiaxin Ai
...
Kai Wang
Xiaojun Chang
Wenqi Shao
Yang You
Jianchao Tan
ELM
LRM
113
3
0
08 Apr 2025
Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness
Dongzhuoran Zhou
Yuqicheng Zhu
Yuan He
Jiaoyan Chen
Evgeny Kharlamov
Steffen Staab
RALM
135
1
0
07 Apr 2025
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLM
OffRL
LRM
145
17
0
07 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
147
0
0
07 Apr 2025
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Zonghang Li
Tao Li
Wenjiao Feng
Mohsen Guizani
Hongfang Yu
34
0
0
07 Apr 2025
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning
Dong Won Lee
Y. Kim
Denison Guvenoz
Sooyeon Jeong
Parker Malachowsky
Louis-Philippe Morency
C. Breazeal
Hae Won Park
95
0
0
07 Apr 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQ
LRM
134
11
0
07 Apr 2025
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen
Zhenyu Zhang
Junyuan Hong
Souvik Kundu
Zhangyang Wang
OffRL
LRM
164
14
0
07 Apr 2025
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Rem Yang
Julian Dai
N. Vasilakis
Martin Rinard
ELM
LRM
52
1
0
07 Apr 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
Emre Can Acikgoz
Cheng Qian
Hongru Wang
Vardhan Dongre
Xiusi Chen
Heng Ji
Dilek Hakkani-Tur
Gokhan Tur
LM&Ro
ELM
218
1
0
07 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
107
16
0
07 Apr 2025
scAgent: Universal Single-Cell Annotation via a LLM Agent
Yuren Mao
Yu Mi
Peigen Liu
Mengfei Zhang
Hanqing Liu
Yunjun Gao
LLMAG
60
2
0
07 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
120
2
0
07 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDa
OffRL
ReLM
LRM
198
11
0
07 Apr 2025
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue
Yufeng Yuan
Qiying Yu
Xiaochen Zuo
Ruofei Zhu
...
Ru Zhang
Xin Liu
Mingxuan Wang
Yonghui Wu
Lin Yan
OffRL
LRM
154
39
0
07 Apr 2025
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu
W. Shi
Yuchen Zhuang
Yue Yu
Joyce C. Ho
Haoyu Wang
Carl Yang
79
3
0
07 Apr 2025
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
Adrián Bazaga
Rexhina Blloshmi
Bill Byrne
Adria de Gispert
ReLM
LRM
109
1
0
07 Apr 2025
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization
Weiwei Sun
Shengyu Feng
Shanda Li
Yiming Yang
LLMAG
99
5
0
06 Apr 2025
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
109
5
0
06 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Yuheng Wu
Wentao Guo
Zirui Liu
Heng Ji
Zhaozhuo Xu
Denghui Zhang
84
0
0
05 Apr 2025
Rethinking Reflection in Pre-Training
Essential AI
Darsh J Shah
Peter Rushton
Somanshu Singla
Mohit Parmar
...
Philip Monk
Platon Mazarakis
Ritvik Kapila
Saurabh Srivastava
Tim Romanski
ReLM
LRM
187
14
0
05 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
103
2
0
05 Apr 2025
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Kate Sanders
Benjamin Van Durme
LRM
155
1
0
04 Apr 2025
Learning Lie Group Generators from Trajectories
Lifan Hu
151
9
0
04 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
93
4
0
04 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
97
3
0
04 Apr 2025
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition
Rishi Hazra
Gabriele Venturato
Pedro Zuidberg Dos Martires
Luc de Raedt
ReLM
LRM
119
2
0
04 Apr 2025
Towards Effective EU E-Participation: The Development of AskThePublic
Kilian Sprenkamp
Nils Messerschmidt
Amir Sartipi
Igor Tchappi
Xiaohui Wu
L. Zavolokina
Gilbert Fridgen
67
0
0
04 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
106
6
0
04 Apr 2025
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
Hung Le
Dai Do
D. Nguyen
Svetha Venkatesh
OffRL
LRM
84
1
0
03 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
123
0
0
03 Apr 2025
AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology
Xiang Feng
Wentao Jiang
Zengmao Wang
Yong Luo
Pingbo Xu
Baosheng Yu
Hua Jin
Bo Du
Jing Zhang
ELM
LRM
86
0
0
03 Apr 2025
Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
Anita Rau
Mark Endo
Josiah Aklilu
Jaewoo Heo
Khaled Saab
Alberto Paderno
Jeffrey Jopling
F. C. Holsinger
Serena Yeung-Levy
101
1
0
03 Apr 2025
Affordable AI Assistants with Knowledge Graph of Thoughts
Maciej Besta
Lorenzo Paleari
Jia Hao Andrea Jiang
Robert Gerstenberger
You Wu
...
Torsten Hoefler
Grzegorz Kwa'sniewski
Marcin Copik
H. Niewiadomski
Torsten Hoefler
LLMAG
RALM
525
0
0
03 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
115
3
0
03 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
145
9
0
03 Apr 2025
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
Andres Algaba
Vincent Holst
Floriano Tori
Melika Mobini
Brecht Verbeken
Sylvia Wenmackers
Vincent Ginis
128
1
0
03 Apr 2025
Generative Evaluation of Complex Reasoning in Large Language Models
Haowei Lin
Xiang Wang
Ruilin Yan
Baizhou Huang
Haotian Ye
Jianhua Zhu
Zihao Wang
James Zou
Jianzhu Ma
Yitao Liang
ReLM
ELM
LRM
449
0
0
03 Apr 2025
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Mohammad Reza Ghasemi Madani
Aryo Pradipta Gema
Gabriele Sarti
Yu Zhao
Pasquale Minervini
Andrea Passerini
AAML
121
1
0
03 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
130
4
0
03 Apr 2025
Understanding Aha Moments: from External Observations to Internal Mechanisms
Shu Yang
Junchao Wu
Xin Chen
Yunze Xiao
Xinyi Yang
Derek F. Wong
Di Wang
LRM
81
10
0
03 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
217
54
0
03 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
171
5
0
03 Apr 2025
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou
Zengzhi Wang
Nikhil Ranjan
Zhoujun Cheng
Liping Tang
Guowei He
Zhengzhong Liu
Eric P. Xing
LRM
151
3
0
03 Apr 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
L. Zhang
...
Jing Su
Tianyu Liu
Rui Long
Kai Shen
Liang Xiang
115
7
0
03 Apr 2025
Statics of continuum planar grasping
Udit Halder
44
0
0
03 Apr 2025
Previous
1
2
3
...
18
19
20
...
25
26
27
Next