Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu
129
0
0
19 May 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li
Chenxi Li
Tong Wu
Xuekai Zhu
Yuxuan Wang
...
Eric Hanchen Jiang
Song-Chun Zhu
Zixia Jia
Ying Nian Wu
Zilong Zheng
LRM
130
1
0
19 May 2025
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
Qiguang Chen
Libo Qin
Jinhao Liu
Yue Liao
Jiaqi Wang
Jingxuan Zhou
Wanxiang Che
LRM
61
0
0
19 May 2025
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
226
1
0
19 May 2025
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL
Yaxun Dai
Wenxuan Xie
Xialie Zhuang
Tianyu Yang
Yiying Yang
Haiqin Yang
Yuhang Zhao
Pingfu Chao
Wenhao Jiang
ReLM
LRM
142
0
0
19 May 2025
Thinkless: LLM Learns When to Think
Gongfan Fang
Xinyin Ma
Xinchao Wang
LLMAG
OffRL
ReLM
LRM
160
3
0
19 May 2025
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh
Harsh Chaudhari
Jaechul Roh
Mingshi Wu
Alina Oprea
Amir Houmansadr
AAML
ELM
134
2
0
19 May 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma
Yinghao Ma
Yanqiao Zhu
Chen Yang
Yi-Wen Chao
...
Wei Xue
Emmanouil Benetos
Kai Yu
Xiaofeng Wang
Xie Chen
AuLLM
LRM
124
1
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
262
0
0
19 May 2025
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
142
0
0
19 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELM
LRM
174
0
0
19 May 2025
Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
Jin Du
Li Chen
Xun Xian
An Luo
Fangqiao Tian
Ganghua Wang
Charles Doss
Xiaotong Shen
Jie Ding
CML
ELM
66
0
0
19 May 2025
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Haoyu Zhao
Yihan Geng
Shange Tang
Yong Lin
Bohan Lyu
Hongzhou Lin
Chi Jin
Sanjeev Arora
108
0
0
19 May 2025
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Soumya Rani Samineni
Durgesh Kalwar
Karthik Valmeekam
Kaya Stechly
Subbarao Kambhampati
OffRL
119
1
0
19 May 2025
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi
Zichen Liu
Tianyu Pang
Chao Du
W. Lee
Min Lin
OffRL
LRM
106
3
0
19 May 2025
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
Guoheng Sun
Ziyao Wang
Bowei Tian
Meng Liu
Zheyu Shen
Shwai He
Yexiao He
Wanghao Ye
Yiting Wang
Ang Li
LRM
70
0
0
19 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
106
3
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
134
0
0
18 May 2025
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAML
ELM
92
0
0
18 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
156
0
0
18 May 2025
Fixed Point Explainability
Emanuele La Malfa
Jon Vadillo
Marco Molinari
Michael Wooldridge
159
0
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
105
0
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Wentao Zhang
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
95
0
0
18 May 2025
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Junxiao Yang
Jinzhe Tu
Haoran Liu
Xiaoce Wang
Chujie Zheng
...
Caishun Chen
Tiantian He
Hongning Wang
Yew-Soon Ong
Minlie Huang
LRM
112
0
0
18 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
128
1
0
18 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Donglin Wang
LRM
222
0
0
18 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
137
1
0
18 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
LRM
207
0
0
18 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
122
2
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
158
3
0
17 May 2025
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
Jiarui Wang
Huiyu Duan
Ziheng Jia
Yu Zhao
Woo Yi Yang
...
Zhongfu Chen
Juntong Wang
Yuke Xing
Guangtao Zhai
Xiongkuo Min
VGen
86
1
0
17 May 2025
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh Jain
Sungjin Ahn
Sungsoo Ahn
Yoshua Bengio
KELM
ReLM
LRM
97
0
0
17 May 2025
Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features
Alex Heyman
Joel Zylberberg
ReLM
HILM
LRM
54
0
0
17 May 2025
Equally Critical: Samples, Targets, and Their Mappings in Datasets
Runkang Yang
Peng Sun
Xinyi Shang
Yi Tang
Tao R. Lin
44
0
0
17 May 2025
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Yiting Wang
Guoheng Sun
Wanghao Ye
Gang Qu
Ang Li
OffRL
3DV
LRM
VLM
100
0
0
17 May 2025
Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission
Seungeun Oh
Jinhyuk Kim
Jihong Park
Seung-Woo Ko
Jinho Choi
Tony Q. S. Quek
Seong-Lyun Kim
76
0
0
17 May 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son
Jiwoo Hong
Honglu Fan
Heejeong Nam
Hyunwoo Ko
...
Jinyeop Song
Jinha Choi
Gonçalo Paulo
Youngjae Yu
Stella Biderman
118
1
0
17 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
101
1
0
17 May 2025
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
Jing Zou
Qingqiu Li
Chenyu Lian
Lihao Liu
Xiaohan Yan
Shujun Wang
Jing Qin
VLM
186
0
0
17 May 2025
Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning
Yuheng Lu
ZiMeng Bai
Caixia Yuan
Huixing Jiang
Xiaojie Wang
LRM
101
0
0
17 May 2025
TinyRS-R1: Compact Multimodal Language Model for Remote Sensing
Aybora Koksal
A. Aydin Alatan
LRM
64
0
0
17 May 2025
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
103
0
0
17 May 2025
IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests
Tan-Hanh Pham
Phu-Vinh Nguyen
Dang The Hung
Bui Trong Duong
Vu Nguyen Thanh
Chris Ngo
Tri Quang Truong
Truong-Son Hy
ReLM
CoGe
VLM
LRM
69
0
0
17 May 2025
Evaluating the Logical Reasoning Abilities of Large Reasoning Models
Hanmeng Liu
Yiran Ding
Zhizhang Fu
Chaoli Zhang
Xiaozhang Liu
Yue Zhang
ELM
LRM
77
1
0
17 May 2025
CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
Hongbo Jin
Ruyang Liu
Wenhao Zhang
Guibo Luo
Ge Li
LRM
115
0
0
17 May 2025
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Fanqi Lin
Ruiqian Nai
Yingdong Hu
Jiacheng You
Junming Zhao
Yang Gao
LRM
115
0
0
17 May 2025
HessFormer: Hessians at Foundation Scale
Diego Granziol
127
0
0
16 May 2025
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&Ro
LRM
176
1
0
16 May 2025
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Jianchao Tan
ELM
LRM
169
0
0
16 May 2025
On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms
Jacob Trauger
Ambuj Tewari
72
0
0
16 May 2025
Previous
1
2
3
...
11
12
13
...
25
26
27
Next