Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Kaiyang Guo
Yinchuan Li
Zhitang Chen
75
0
0
29 May 2025
Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition
Weizhe Kong
Xiao Wang
Ruichong Gao
Chenglong Li
Yu Zhang
Xing Yang
Yaowei Wang
Jin Tang
AAML
66
0
0
29 May 2025
Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting
Chang Liu
Bohao Zhao
Jingtao Ding
Huandong Wang
Yong Li
Mamba
AI4CE
43
0
0
29 May 2025
Scaling Reasoning without Attention
Xueliang Zhao
Wei Wu
Lingpeng Kong
OffRL
ReLM
LRM
VLM
86
0
0
28 May 2025
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Donghyeon Joo
Helya Hosseini
Ramyad Hadidi
Bahar Asgari
74
0
0
28 May 2025
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
Feibo Jiang
Cunhua Pan
Li Dong
Kezhi Wang
O. Dobre
Mérouane Debbah
LLMAG
AI4TS
186
1
0
28 May 2025
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Lei Yu
Yechao Zhang
Ziqi Zhou
Yang Wu
Wei Wan
Minghui Li
Shengshan Hu
Pei Xiaobing
Jing Wang
AAML
35
0
0
28 May 2025
Pre-Training Curriculum for Multi-Token Prediction in Language Models
Ansar Aynetdinov
Alan Akbik
LRM
64
0
0
28 May 2025
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Hanting Chen
Yasheng Wang
Kai Han
Dong Li
Lin Li
...
Hailin Hu
Yehui Tang
Dacheng Tao
Xinghao Chen
Yunhe Wang
LRM
105
0
0
28 May 2025
Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Chunyi Peng
Zhipeng Xu
Zhenghao Liu
Yishan Li
Yukun Yan
...
Zhiyuan Liu
Yu Gu
Minghe Yu
Ge Yu
Maosong Sun
LRM
114
1
0
28 May 2025
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models
Feng Luo
Yu-Neng Chuang
Guanchu Wang
Hoang Anh Duy Le
Shaochen Zhong
...
Jiayi Yuan
Yang Sui
Vladimir Braverman
Vipin Chaudhary
Helen Zhou
LRM
82
1
0
28 May 2025
Scaling Offline RL via Efficient and Expressive Shortcut Models
Nicolas Espinosa-Dice
Yiyi Zhang
Yiding Chen
Bradley Guo
Owen Oertell
Gokul Swamy
Kianté Brantley
Wen Sun
OffRL
LRM
79
0
0
28 May 2025
Training Language Models to Generate Quality Code with Program Analysis Feedback
Feng Yao
Zilong Wang
Liyuan Liu
Junxia Cui
Li Zhong
Xiaohan Fu
Haohui Mai
Vish Krishnan
Jianfeng Gao
Jingbo Shang
54
0
0
28 May 2025
When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy
Jirui Qi
Shan Chen
Zidi Xiong
Raquel Fernández
Danielle S. Bitterman
Arianna Bisazza
LRM
102
0
0
28 May 2025
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang
Zunnan Xu
Jun Zhou
Ting Liu
Yicheng Xiao
Mingwen Ou
Bowen Ji
Xiu Li
Kehong Yuan
VLM
101
0
0
28 May 2025
Compensating for Data with Reasoning: Low-Resource Machine Translation with LLMs
Samuel Frontull
Thomas Ströhle
LRM
40
0
0
28 May 2025
Fostering Video Reasoning via Next-Event Prediction
Haonan Wang
Hongfu Liu
Xiangyan Liu
C. Du
Kenji Kawaguchi
Ye Wang
Tianyu Pang
AI4TS
LRM
88
0
0
28 May 2025
ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room
Nikita Mehandru
Niloufar Golchini
David Bamman
Travis Zack
Melanie F. Molina
Ahmed Alaa
ELM
87
0
0
28 May 2025
SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model
Yifan Chang
Yukang Feng
Jianwen Sun
Jiaxin Ai
Chuanhao Li
Sizhuo Zhou
Kaipeng Zhang
EGVM
86
0
0
28 May 2025
Maximizing Confidence Alone Improves Reasoning
Mihir Prabhudesai
Lili Chen
Alex Ippoliti
Katerina Fragkiadaki
Hao Liu
Deepak Pathak
OOD
OffRL
ReLM
LRM
144
3
0
28 May 2025
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
Qiuchen Wang
Ruixue Ding
Y. Zeng
Zehui Chen
Lin Yen-Chen
Shihang Wang
Pengjun Xie
Fei Huang
Feng Zhao
VLM
LRM
90
0
0
28 May 2025
Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R
2
^2
2
)GRPO
Ran Li
Shimin Di
Yuchen Liu
Chen Jing
Yu Qiu
Lei Chen
LRM
81
0
0
28 May 2025
ChatCFD: an End-to-End CFD Agent with Domain-specific Structured Thinking
E Fan
Weizong Wang
Tianhan Zhang
ALM
AI4CE
31
0
0
28 May 2025
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
M. Shalyt
Rotem Elimelech
I. Kaminer
37
0
0
28 May 2025
Reinforced Reasoning for Embodied Planning
Di Wu
Jiaxin Fan
Junzhe Zang
G. Wang
Wei Yin
Wenhao Li
Bo Jin
LRM
129
0
0
28 May 2025
Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling
Fanzeng Xia
Yidong Luo
Tinko Sebastian Bartels
Yaqi Xu
Tongxin Li
ReLM
LRM
102
0
0
28 May 2025
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models
Siqi Fan
Peng Han
Shuo Shang
Yequan Wang
Aixin Sun
LLMAG
LRM
103
3
0
28 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLM
LRM
VLM
110
0
0
28 May 2025
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
Lai Wei
Yuting Li
Kaipeng Zheng
Chen Wang
Yue Wang
Linghe Kong
Lichao Sun
Weiran Huang
OffRL
ReLM
LRM
106
1
0
28 May 2025
Let's Predict Sentence by Sentence
Hyeonbin Hwang
Byeongguk Jeon
Seungone Kim
Jiyeon Kim
Hoyeon Chang
Sohee Yang
Seungpil Won
Dohaeng Lee
Youbin Ahn
Minjoon Seo
96
0
0
28 May 2025
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
Muxi Diao
Lele Yang
Hongbo Yin
Zhexu Wang
Yejie Wang
Daxin Tian
Kongming Liang
Zhanyu Ma
VLM
LRM
69
1
0
27 May 2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley
Mingyu Chen
Zhaolin Gao
Jason D. Lee
Wen Sun
Wenhao Zhan
Xuezhou Zhang
OffRL
LRM
96
1
0
27 May 2025
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Hao Li
He Cao
Bin Feng
Yanjun Shao
Xiangru Tang
Zhiyuan Yan
Li Yuan
Yonghong Tian
Yu-Feng Li
LRM
ELM
93
0
0
27 May 2025
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Muzhi Zhu
Hao Zhong
Canyu Zhao
Zongze Du
Zheng Huang
...
Hao Chen
Cheng Zou
Jingdong Chen
Ming-Hsuan Yang
Chunhua Shen
LRM
178
0
0
27 May 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
73
0
0
27 May 2025
SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Ting Xu
Zhichao Huang
Jiankai Sun
Shanbo Cheng
Wai Lam
OffRL
29
0
0
27 May 2025
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models
Chengyu Wang
Junbing Yan
Wenrui Cai
Yuanhao Yue
Jun Huang
VLM
55
0
0
27 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
83
0
0
27 May 2025
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Jiyoung Lee
Seungho Kim
Jieun Han
Jun-Min Lee
Kitaek Kim
Alice Oh
E. Choi
75
0
0
27 May 2025
Reinforced Informativeness Optimization for Long-Form Retrieval-Augmented Generation
Yuhao Wang
Ruiyang Ren
Yucheng Wang
Wayne Xin Zhao
Jing Liu
Hua Wu
Haifeng Wang
RALM
OffRL
84
0
0
27 May 2025
Reinforcing General Reasoning without Verifiers
Xiangxin Zhou
Zichen Liu
Anya Sims
Haonan Wang
Tianyu Pang
Chongxuan Li
Liang Wang
Min Lin
C. Du
OffRL
LRM
94
2
0
27 May 2025
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari
Shivanshu Gupta
Fei Wang
Anshuman Chhabra
Muhao Chen
TDI
66
0
0
27 May 2025
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri
Hubert Strauss
Tri Dao
83
2
0
27 May 2025
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An
Ruochen Wang
Tianyi Zhou
Cho-Jui Hsieh
KELM
LRM
96
1
0
27 May 2025
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
Fengqing Jiang
Fengbo Ma
Zhangchen Xu
Yuetai Li
Bhaskar Ramasubramanian
Luyao Niu
Bo Li
Xianyan Chen
Zhen Xiang
Radha Poovendran
ALM
ELM
85
1
0
27 May 2025
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song
Mao Zheng
OffRL
LRM
102
1
0
27 May 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Zijun Liu
Zhennan Wan
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
LLMAG
89
0
0
27 May 2025
TeroSeek: An AI-Powered Knowledge Base and Retrieval Generation Platform for Terpenoid Research
Xu Kang
Siqi Jiang
Kangwei Xu
Jiahao Li
Ruibo Wu
RALM
47
0
0
27 May 2025
HCQA-1.5 @ Ego4D EgoSchema Challenge 2025
Haoyu Zhang
Yisen Feng
Qiaohui Chu
Meng Liu
Weili Guan
Yaowei Wang
Liqiang Nie
47
3
0
27 May 2025
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Yongchao Chen
Y. Liu
Junwei Zhou
Yilun Hao
Jingquan Wang
Yang Zhang
Chuchu Fan
OffRL
ReLM
AI4TS
SyDa
ALM
LRM
81
0
0
27 May 2025
Previous
1
2
3
...
6
7
8
...
25
26
27
Next