Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
HCQA-1.5 @ Ego4D EgoSchema Challenge 2025
Haoyu Zhang
Yisen Feng
Qiaohui Chu
Meng Liu
Weili Guan
Yaowei Wang
Liqiang Nie
49
3
0
27 May 2025
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Muzhi Zhu
Hao Zhong
Canyu Zhao
Zongze Du
Zheng Huang
...
Hao Chen
Cheng Zou
Jingdong Chen
Ming-Hsuan Yang
Chunhua Shen
LRM
178
0
0
27 May 2025
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding
Zhaowei Zhang
Minghua Yi
Mengmeng Wang
Fengshuo Bai
Zilong Zheng
Yipeng Kang
Yaodong Yang
82
1
0
26 May 2025
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen
Qianyu He
Siyu Yuan
Aili Chen
Zhicheng Cai
...
Qiying Yu
Xuefeng Li
Jiaze Chen
Hao Zhou
Mingxuan Wang
ReLM
LRM
120
2
0
26 May 2025
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers
Zhengliang Shi
Lingyong Yan
Dawei Yin
Suzan Verberne
Maarten de Rijke
Zhaochun Ren
LRM
115
1
0
26 May 2025
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie
David Qiu
Deepak Gopinath
Dong Lin
Yanchao Sun
Chong-Jun Wang
Saloni Potdar
Bhuwan Dhingra
KELM
LRM
88
0
0
26 May 2025
Modeling Beyond MOS: Quality Assessment Models Must Integrate Context, Reasoning, and Multimodality
M. A. Kerkouri
Marouane Tliba
Aladine Chetouani
Nour Aburaed
Alessandro Bruno
90
1
0
26 May 2025
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
Yifan Wu
Jingze Shi
Bingheng Wu
Jiayi Zhang
Xiaotian Lin
Nan Tang
Yuyu Luo
LRM
100
1
0
26 May 2025
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Jun Rao
Min Zhang
OffRL
LRM
90
0
0
26 May 2025
Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks
Chang Liu
Haomin Zhang
Shiyu Xia
Zihao Chen
Chaofan Ding
Xin Yue
Huizhe Chen
Xinhan Di
65
0
0
26 May 2025
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi
Davide Bucciarelli
Federico Betti
Marcella Cornia
Lorenzo Baraldi
N. Sebe
Rita Cucchiara
242
0
0
26 May 2025
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Chao Huang
Benfeng Wang
Jie Wen
Chengliang Liu
Wei Wang
Li Shen
Xiaochun Cao
LRM
81
0
0
26 May 2025
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni
Zhengyuan Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
W. Zuo
Lijuan Wang
ReLM
LRM
97
1
0
26 May 2025
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection
Zhihong Pan
Kai Zhang
Yuze Zhao
Yupeng Han
LRM
66
0
0
26 May 2025
CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation
Guang Yang
Yu Zhou
Xiang Chen
Wei-Shi Zheng
Xing Hu
Xin Zhou
David Lo
Taolue Chen
ALM
LRM
100
0
0
26 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
80
2
0
26 May 2025
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs
Juntong Wang
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Xiongkuo Min
52
1
0
26 May 2025
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Matthew Lisondra
B. Benhabib
G. Nejat
LM&Ro
91
0
0
26 May 2025
Lifelong Safety Alignment for Language Models
Haoyu Wang
Zeyu Qin
Yifei Zhao
C. Du
Min Lin
Xueqian Wang
Tianyu Pang
KELM
CLL
72
1
0
26 May 2025
The Coverage Principle: A Framework for Understanding Compositional Generalization
Hoyeon Chang
Jinho Park
Hanseul Cho
Sohee Yang
Miyoung Ko
Hyeonbin Hwang
Seungpil Won
Dohaeng Lee
Youbin Ahn
Minjoon Seo
70
0
0
26 May 2025
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Junnan Liu
Hongwei Liu
Linchen Xiao
Shudong Liu
Taolin Zhang
Zihan Ma
Songyang Zhang
Kai Chen
LRM
143
0
0
26 May 2025
MT
3
^{3}
3
: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Zhaopeng Feng
Yupu Liang
Shaosheng Cao
Jiayuan Su
Jiahan Ren
Zhe Xu
Yao Hu
Wenxuan Huang
Jian Wu
Zuozhu Liu
VLM
LRM
116
0
0
26 May 2025
Learning to Reason without External Rewards
Xuandong Zhao
Zhewei Kang
Aosong Feng
Sergey Levine
Dawn Song
OffRL
ReLM
LRM
148
8
0
26 May 2025
ARM: Adaptive Reasoning Model
Siye Wu
Jian Xie
Yikai Zhang
Aili Chen
Kai Zhang
Yu Su
Yanghua Xiao
LRM
84
0
0
26 May 2025
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
Hanting Chen
Jiarui Qin
Jialong Guo
Tao Yuan
Yichun Yin
...
Can Chen
Xinghao Chen
Fisher Yu
Ruiming Tang
Yunhe Wang
74
0
0
26 May 2025
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leander Diaz-Bone
Marco Bagatella
Jonas Hübotter
Andreas Krause
OffRL
100
0
0
26 May 2025
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Yixin Cui
Haotian Lin
Shuo Yang
Yixiao Wang
Yanjun Huang
Hong Chen
LM&Ro
LRM
ELM
135
0
0
26 May 2025
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
Zeyi Huang
Zeyi Huang
Anirudh Sundara Rajan
Zefan Cai
Wen Xiao
Junjie Hu
Yong Jae Lee
82
0
0
26 May 2025
LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation
Weikang Yuan
Kaisong Song
Zhuoren Jiang
Junjie Cao
Y. Zhang
Jun Lin
Kun Kuang
Ji Zhang
Xiaozhong Liu
AILaw
ELM
26
0
0
26 May 2025
What Can RL Bring to VLA Generalization? An Empirical Study
Jijia Liu
Feng Gao
Bingwen Wei
Xinlei Chen
Qingmin Liao
Yi Wu
Chao Yu
Yu Wang
OffRL
330
0
0
26 May 2025
RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback
Junyang Shu
Zhiwei Lin
Yongtao Wang
53
0
0
26 May 2025
Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Jiabao Ji
Yongchao Chen
Yang Zhang
Ramana Rao Kompella
Chuchu Fan
Gaowen Liu
Shiyu Chang
126
0
0
26 May 2025
Temporal Sampling for Forgotten Reasoning in LLMs
Yuetai Li
Zhangchen Xu
Fengqing Jiang
Bhaskar Ramasubramanian
Luyao Niu
Bill Yuchen Lin
Xiang Yue
Radha Poovendran
CLL
KELM
LRM
81
0
0
26 May 2025
Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models
Baihui Zheng
Boren Zheng
Kerui Cao
Y. Tan
Zhendong Liu
...
Jian Yang
Wenbo Su
Xiaoyong Zhu
Bo Zheng
Kaifu Zhang
ELM
90
0
0
26 May 2025
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
Tianle Li
Jihai Zhang
Yongming Rao
Yu Cheng
CoGe
LRM
VLM
108
0
0
26 May 2025
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue
Yichun Feng
Jiawei Wang
Lu Zhou
Yixue Li
OffRL
LM&MA
226
0
0
26 May 2025
Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things
Kai Li
Conggai Li
Xin Yuan
Shenghong Li
Sai Zou
...
W. Ni
Dusit Niyato
Abbas Jamalipour
Falko Dressler
Ozgur B. Akan
AI4CE
39
0
0
26 May 2025
Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning
Ruolin Shen
Xiaozhong Ji
Kai WU
Jiangning Zhang
Yijun He
HaiHua Yang
Xiaobin Hu
Xiaoyu Sun
82
0
0
26 May 2025
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Bingguang Hao
Maolin Wang
Zengzhuang Xu
Cunyin Peng
Yicheng Chen
Xiangyu Zhao
Jinjie Gu
Chenyi Zhuang
ReLM
LRM
127
0
0
26 May 2025
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Ibragim Badertdinov
Alexander Golubev
Maksim Nekrashevich
Anton Shevtsov
Simon Karasik
Andrei Andriushchenko
Maria Trofimova
Daria Litvintseva
Boris Yangel
59
0
0
26 May 2025
CaseEdit: Enhancing Localized Commonsense Reasoning via Null-Space Constrained Knowledge Editing in Small Parameter Language Models
Varun Reddy
Yen-Ling Kuo
KELM
61
0
0
26 May 2025
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
Yiqun Zhang
Hao Li
Chenxu Wang
L. Chen
Qiaosheng Zhang
...
Xinrun Wang
Jia Xu
Lei Bai
Wanli Ouyang
Shuyue Hu
84
0
0
26 May 2025
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition
Zihao Zeng
Xuyao Huang
Boxiu Li
Hao Zhang
Zhijie Deng
ReLM
LRM
56
0
0
26 May 2025
Do Large Language Models (Really) Need Statistical Foundations?
Weijie Su
287
0
0
25 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
195
1
0
25 May 2025
Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning
Shaohao Rui
Kaitao Chen
Weijie Ma
Xiaosong Wang
OffRL
LRM
33
0
0
25 May 2025
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu
Fan Wu
Haotian Ye
David A. Forsyth
James Y. Zou
Nan Jiang
Jiaqi W. Ma
Han Zhao
OffRL
82
0
0
25 May 2025
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas
Steffen Backmann
David Guzman Piedrahita
Emanuel Tewolde
Rada Mihalcea
Bernhard Schölkopf
Zhijing Jin
108
0
0
25 May 2025
Reinforced Latent Reasoning for LLM-based Recommendation
Yang Zhang
Wenxin Xu
Xiaoyan Zhao
Wenjie Wang
Fuli Feng
Xiangnan He
Tat-Seng Chua
OffRL
LRM
64
2
0
25 May 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
199
2
0
25 May 2025
Previous
1
2
3
...
7
8
9
...
25
26
27
Next