Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRL
LRM
139
0
0
23 May 2025
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
Che Liu
Haozhe Wang
J. Pan
Zhongwei Wan
Yong Dai
Fangzhen Lin
Wenjia Bai
Daniel Rueckert
Rossella Arcucci
OffRL
LRM
ELM
118
1
0
23 May 2025
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi
Keith G. Mills
Baochun Li
Di Niu
LRM
87
0
0
23 May 2025
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Michael Hassid
Gabriel Synnaeve
Yossi Adi
Roy Schwartz
ReLM
LRM
118
1
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
57
0
0
23 May 2025
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving
Zixian Guo
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Lei Zhang
W. Zuo
LRM
VLM
111
0
0
23 May 2025
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang
Jichu Li
Yang Liu
Jinjie Wei
F. I. S. Kevin Zhou
Quyu Kong
MLLM
71
0
0
23 May 2025
Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
Yao Xu
Mingyu Xu
Fangyu Lei
Wangtao Sun
Xiangrong Zeng
Bingning Wang
Guang Liu
Shizhu He
Jun Zhao
Kang Liu
LRM
84
1
0
22 May 2025
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects
Jipeng Zhang
Haolin Yang
Kehao Miao
Ruiyuan Zhang
Renjie Pi
Jiahui Gao
Xiaofang Zhou
194
0
0
22 May 2025
Training-Free Reasoning and Reflection in MLLMs
Hongchen Wei
Zhenzhong Chen
OffRL
VLM
LRM
115
0
0
22 May 2025
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
Xiaoqing Zhang
Huabin Zheng
Ang Lv
Yuhan Liu
Zirui Song
Flood Sung
Xiuying Chen
Rui Yan
OffRL
ReLM
LRM
AI4CE
128
0
0
22 May 2025
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Rui Ye
Xiangrui Liu
Qimin Wu
Xianghe Pang
Zhenfei Yin
Lei Bai
Siheng Chen
LLMAG
86
0
0
22 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
204
0
0
22 May 2025
ReCopilot: Reverse Engineering Copilot in Binary Analysis
Guoqiang Chen
Huiqi Sun
Daguang Liu
Zhiqi Wang
Qiang Wang
Bin Yin
Lu Liu
Lingyun Ying
45
0
0
22 May 2025
Constant Bit-size Transformers Are Turing Complete
Qian Li
Yuyi Wang
LRM
25
0
0
22 May 2025
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
Junhong Lin
Xinyue Zeng
Jie Zhu
Song Wang
Julian Shun
Jun Wu
Dawei Zhou
LRM
167
1
0
22 May 2025
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
Yifan Zhang
Xinkui Zhao
Zuxin Wang
Guanjie Cheng
Yueshen Xu
Shuiguang Deng
Yuxiang Cai
103
0
0
22 May 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
Huanjin Yao
Qixiang Yin
Jingyi Zhang
Min Yang
Yibo Wang
...
Fei Su
Li Shen
Minghui Qiu
Dacheng Tao
Jiaxing Huang
LRM
74
0
0
22 May 2025
A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization
Shengyu Feng
Weiwei Sun
Shanda Li
Ameet Talwalkar
Yiming Yang
89
1
0
22 May 2025
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
Shuhao Han
Haotian Fan
Fangyuan Kong
Wenjie Liao
Chunle Guo
...
Jian Guo
Zhizhuo Shao
Ziyu Feng
Bing Li
Weiming Hu
198
11
0
22 May 2025
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Beier Luo
Shuoyuan Wang
Yixuan Li
Jianguo Huang
73
0
0
22 May 2025
Optimal Policy Minimum Bayesian Risk
Ramón Fernandez Astudillo
Md Arafat Sultan
Aashka Trivedi
Yousef El-Kurdi
Tahira Naseem
Radu Florian
Salim Roukos
OffRL
61
0
0
22 May 2025
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Seamus Somerstep
Vinod Raman
Unique Subedi
Yuekai Sun
78
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLM
LRM
154
0
0
22 May 2025
Foundation Models for Geospatial Reasoning: Assessing Capabilities of Large Language Models in Understanding Geometries and Topological Spatial Relations
Yuhan Ji
Song Gao
Ying Nie
Ivan Majic
K. Janowicz
ReLM
LRM
173
2
0
22 May 2025
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
Woosung Koh
Wonbeen Oh
Jaein Jang
MinHyung Lee
Hyeongjin Kim
Ah Yeon Kim
Joonkee Kim
Junghyun Lee
Taehyeon Kim
Se-Young Yun
LRM
TTA
119
0
0
22 May 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Ilgee Hong
Changlong Yu
Liang Qiu
Weixiang Yan
Zhenghao Xu
...
Qingru Zhang
Qin Lu
Xin Liu
Chao Zhang
Tuo Zhao
OffRL
ReLM
LRM
88
0
0
22 May 2025
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development
Yaxin Du
Yuzhu Cai
Yifan Zhou
Cheng-Yu Wang
Yu Qian
Xianghe Pang
Qian Liu
Yue Hu
Siheng Chen
70
0
0
22 May 2025
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques
Jianing Geng
Biao Yi
Zekun Fei
Tongxi Wu
Lihai Nie
Zheli Liu
AAML
52
0
0
22 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
116
0
0
22 May 2025
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang
Dian Li
Qiang Zhang
Chenjun
sinbadliu
Junxiong Lin
Jiahong Yan
Jiawei Liu
Zheng-Jun Zha
OffRL
53
0
0
22 May 2025
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang
Zirui Liu
Hongye Jin
Qingyu Yin
Vipin Chaudhary
Xiaotian Han
ReLM
LRM
85
0
0
22 May 2025
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
Chenhao Zhang
Yazhe Niu
120
0
0
22 May 2025
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Doohyuk Jang
Yoonjeon Kim
Chanjae Park
Hyun Ryu
Eunho Yang
LRM
105
0
0
22 May 2025
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan
Jiaze Li
Jianzhong Ju
Zhenbo Luo
Jian Luan
Ruihua Song
ReLM
OffRL
LRM
109
1
0
22 May 2025
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong
Ziyu Guo
Renrui Zhang
Wenyu Shan
Xinyu Wei
Zhenghao Xing
Hongsheng Li
Pheng-Ann Heng
EGVM
OffRL
LRM
125
1
0
22 May 2025
Latent Principle Discovery for Language Model Self-Improvement
Keshav Ramji
Tahira Naseem
Ramón Fernandez Astudillo
LRM
113
0
0
22 May 2025
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation
Bowen Zheng
Xiaolei Wang
Enze Liu
Xi Wang
Lu Hongyu
Yu Chen
Wayne Xin Zhao
Ji-Rong Wen
136
0
0
22 May 2025
R
2
ec
\text{R}^2\text{ec}
R
2
ec
: Towards Large Recommender Models with Reasoning
Runyang You
Chak Tou Leong
Xinyu Lin
Xin Zhang
Wenjie Wang
Wenjie Li
Liqiang Nie
LRM
97
0
0
22 May 2025
LLM Access Shield: Domain-Specific LLM Framework for Privacy Policy Compliance
Yu Wang
Cailing Cai
Zhihua Xiao
Peifung E. Lam
68
0
0
22 May 2025
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
...
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
ELM
61
0
0
22 May 2025
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
Rui Ye
Keduan Huang
Qimin Wu
Yuzhu Cai
Tian Jin
...
Bo An
Yang Gao
Wenjun Wu
Lei Bai
Siheng Chen
LLMAG
141
1
0
22 May 2025
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
NovelSeek Team
Bo Zhang
Shiyang Feng
Xiangchao Yan
Jiakang Yuan
...
Zhongying Tu
Xiangyu Yue
W. Ouyang
Bowen Zhou
Lei Bai
LLMAG
128
2
0
22 May 2025
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
Hyang Cui
LRM
119
0
0
22 May 2025
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
Shicheng Xu
Liang Pang
Yunchang Zhu
Jia Gu
Zihao Wei
Jingcheng Deng
Feiyang Pan
Huawei Shen
Xueqi Cheng
OffRL
LRM
126
0
0
22 May 2025
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
LRM
OffRL
158
3
0
21 May 2025
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Jan Tönshoff
Martin Grohe
102
0
0
21 May 2025
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Zhexin Zhang
Yuhao Sun
Junxiao Yang
Shiyao Cui
Hongning Wang
Minlie Huang
AAML
106
0
0
21 May 2025
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang
Xuehai He
Weixiang Yan
Ao Shen
Chenyang Zhao
Shuaiqiang Wang
Yelong Shen
Xin Eric Wang
LRM
123
3
0
21 May 2025
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao
Xingyu Sui
Yulin Hu
Jiahe Guo
Haixiao Liu
Biye Li
Yanyan Zhao
Bing Qin
Ting Liu
OffRL
115
1
0
21 May 2025
Previous
1
2
3
...
9
10
11
...
25
26
27
Next