Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
Superalignment with Dynamic Human Values
Florian Mai
David Kaczér
Nicholas Kluge Corrêa
Lucie Flek
140
0
0
17 Mar 2025
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu
Yunze Liu
Chonghan Liu
Miao Liu
VGen
LRM
120
7
0
16 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
185
2
0
16 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yize Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
225
31
0
16 Mar 2025
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices
Yangyijian Liu
Jun Yu Li
Wu-Jun Li
70
0
0
15 Mar 2025
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Arsh Koneru
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VLM
152
5
0
15 Mar 2025
Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes
Da Wu
Zhanliang Wang
Quan Nguyen
Kai Wang
464
1
0
15 Mar 2025
SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning
Edward Y. Chang
Longling Geng
LLMAG
LRM
138
5
0
15 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
436
0
0
15 Mar 2025
Residual Policy Gradient: A Reward View of KL-regularized Objective
Pengcheng Wang
Xinghao Zhu
Yuxin Chen
Chenfeng Xu
Masayoshi Tomizuka
Chenran Li
87
0
0
14 Mar 2025
Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms
Camilo Chacón Sartori
Christian Blum
84
0
0
14 Mar 2025
Implicit Bias-Like Patterns in Reasoning Models
Messi H.J. Lee
Calvin K. Lai
LRM
129
0
0
14 Mar 2025
A Review of DeepSeek Models' Key Innovative Techniques
Chengen Wang
Murat Kantarcioglu
VLM
OffRL
102
4
0
14 Mar 2025
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang
Yang Liu
Weixing Chen
Jingzhou Luo
Ziliang Chen
Ling Pan
G. Li
Liang Lin
110
4
0
14 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
190
39
0
14 Mar 2025
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Gang Li
Jizhong Liu
Heinrich Dinkel
Yadong Niu
Junbo Zhang
Jian Luan
OffRL
LRM
ReLM
167
12
0
14 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
108
1
0
14 Mar 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Giacomo Camposampiero
Michael Hersche
Roger Wattenhofer
Abu Sebastian
Abbas Rahimi
LRM
111
2
0
14 Mar 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
150
5
0
14 Mar 2025
LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions
Gaurav Kumar Gupta
Pranal Pande
Nirajan Acharya
Aniket Kumar Singh
Suman Niroula
81
0
0
13 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLM
LRM
159
100
1
13 Mar 2025
New Trends for Modern Machine Translation with Large Reasoning Models
Sinuo Liu
Chenyang Lyu
Mingyang Wu
Longyue Wang
Weihua Luo
Kaifu Zhang
Zifu Shang
LRM
151
7
0
13 Mar 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
226
7
0
13 Mar 2025
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Siyin Wang
Zhaoye Fei
Qinyuan Cheng
Shanghang Zhang
Panpan Cai
Jinlan Fu
Xipeng Qiu
85
2
0
13 Mar 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
110
1
0
13 Mar 2025
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
OffRL
ViT
169
20
0
13 Mar 2025
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
Xudong Tan
Peng Ye
Chongjun Tu
Jianjian Cao
Yaoxin Yang
Lin Zhang
Dongzhan Zhou
Tao Chen
VLM
161
3
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
Lawrence Yunliang Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
156
39
0
13 Mar 2025
Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback
Derun Li
Jianwei Ren
Y. Wang
Xin Wen
Pengxiang Li
...
Zhongpu Xia
Peng Jia
Xianpeng Lang
Ningyi Xu
Hang Zhao
121
7
0
13 Mar 2025
Global Position Aware Group Choreography using Large Language Model
Haozhou Pang
Tianwei Ding
Lanshan He
Qi Gan
SLR
96
0
0
12 Mar 2025
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
71
0
0
12 Mar 2025
Reinforcement Learning is all You Need
Yongsheng Lian
ReLM
OffRL
LRM
102
0
0
12 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Zhen Zhang
Biwei Huang
Anton van den Hengel
Javen Qinfeng Shi
Javen Qinfeng Shi
465
1
0
12 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
238
122
0
12 Mar 2025
Chemical reasoning in LLMs unlocks steerable synthesis planning and reaction mechanism elucidation
Andres M Bran
Theo A Neukomm
Daniel Armstrong
Zlatko Joncev
Philippe Schwaller
LRM
130
3
0
11 Mar 2025
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process
Minjun Zhu
Yixuan Weng
Linyi Yang
Yue Zhang
ALM
LRM
116
7
0
11 Mar 2025
HOFAR: High-Order Augmentation of Flow Autoregressive Transformers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Mingda Wan
172
1
0
11 Mar 2025
ProtTeX: Structure-In-Context Reasoning and Editing of Proteins with Large Language Models
Zicheng Ma
Chuanliu Fan
Zhicong Wang
Zhenyu Chen
Xiaohan Lin
Yongqian Li
Shihao Feng
Jun Zhang
Ziqiang Cao
Y. Gao
127
0
0
11 Mar 2025
General-Purpose Aerial Intelligent Agents Empowered by Large Language Models
Ji Zhao
Xiao Lin
LLMAG
113
2
0
11 Mar 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
ReLM
LRM
204
20
0
11 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Yongxu Liu
M. Yang
Chunhua Shen
MLLM
VLM
133
1
0
11 Mar 2025
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Pol G. Recasens
Ferran Agullo
Yue Zhu
Chen Wang
Eun Kyung Lee
Olivier Tardieu
Jordi Torres
Josep Ll. Berral
94
1
0
11 Mar 2025
Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method
Fei Wang
Chong Chen
Hongyu Chen
Yugang Chang
Weiming Zeng
ObjD
122
0
0
11 Mar 2025
LightPlanner: Unleashing the Reasoning Capabilities of Lightweight Large Language Models in Task Planning
Weijie Zhou
Yi Peng
Manli Tao
Chaoyang Zhao
Honghui Dong
Ming Tang
Jinqiao Wang
LLMAG
LRM
115
1
0
11 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
Mamba
MLLM
158
1
0
11 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
95
2
0
11 Mar 2025
AI-native Memory 2.0: Second Me
Jiale Wei
Xiang Ying
Tao Gao
Fangyi Bao
Felix Tao
Jingbo Shang
132
1
0
11 Mar 2025
Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents
R. Xu
Mingyu Wang
Xintao Wang
Dakuan Lu
Jue Chen
Wei Chu
Yinghui Xu
LRM
LLMAG
151
1
0
11 Mar 2025
HELM: Human-Preferred Exploration with Language Models
Shuhao Liao
Xuxin Lv
Yuhong Cao
Jeric Lew
Wenjun Wu
Guillaume Sartoretti
116
0
0
10 Mar 2025
Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
Fan Yin
Zifeng Wang
I-Hung Hsu
Jun Yan
Ke Jiang
...
L. Le
Kai-Wei Chang
Chen-Yu Lee
Hamid Palangi
Tomas Pfister
124
4
0
10 Mar 2025
Previous
1
2
3
...
22
23
24
25
26
27
Next