Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Z. Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Jianxin Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 753 papers shown
Title
Improved Methods for Model Pruning and Knowledge Distillation
Wei Jiang
Anying Fu
Youling Zhang
VLM
7
0
0
20 May 2025
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL
Yaxun Dai
Wenxuan Xie
Xialie Zhuang
Tianyu Yang
Yiying Yang
Haiqin Yang
Yuhang Zhao
Pingfu Chao
Wenhao Jiang
ReLM
LRM
27
0
0
19 May 2025
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh
Harsh Chaudhari
Jaechul Roh
Mingshi Wu
Alina Oprea
Amir Houmansadr
AAML
ELM
12
0
0
19 May 2025
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Chenlin Ming
Chendi Qu
Mengzhang Cai
Qizhi Pei
Zhuoshi Pan
Yu Li
Xiaoming Duan
Lijun Wu
Zeang Sheng
12
0
0
19 May 2025
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
24
0
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Y. Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
17
0
0
19 May 2025
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Haoyu Zhao
Yihan Geng
Shange Tang
Yong Lin
Bohan Lyu
Hongzhou Lin
Chi Jin
Sanjeev Arora
9
0
0
19 May 2025
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu
16
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
12
0
0
19 May 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma
Yinghao Ma
Yanqiao Zhu
Chen Yang
Yi-Wen Chao
...
Wei Xue
Emmanouil Benetos
Kai Yu
Eng Siong Chng
Xie Chen
AuLLM
LRM
12
0
0
19 May 2025
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan YU
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
9
0
0
19 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
26
0
0
19 May 2025
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Zhongxiang Sun
Qipeng Wang
Haoyu Wang
Xiao Zhang
Jun Xu
HILM
LRM
9
0
0
19 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
LRM
4
0
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
7
0
0
18 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
12
0
0
18 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Donglin Wang
LRM
20
0
0
18 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
7
0
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
4
0
0
18 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
9
0
0
18 May 2025
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAML
ELM
14
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
12
0
0
18 May 2025
Fixed Point Explainability
Emanuele La Malfa
Jon Vadillo
Marco Molinari
Michael Wooldridge
7
0
0
18 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
7
0
0
18 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
12
0
0
17 May 2025
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction
Jing Zou
Qingqiu Li
Chenyu Lian
Lihao Liu
Xiaohan Yan
Shujun Wang
Jing Qin
VLM
2
0
0
17 May 2025
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Yiting Wang
Guoheng Sun
Wanghao Ye
Gang Qu
Ang Li
OffRL
3DV
LRM
VLM
12
0
0
17 May 2025
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh Jain
Sungjin Ahn
Sungsoo Ahn
Yoshua Bengio
KELM
ReLM
LRM
9
0
0
17 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
23
0
0
17 May 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son
Jiwoo Hong
Honglu Fan
Heejeong Nam
Hyunwoo Ko
...
Jinyeop Song
Jinha Choi
Gonçalo Paulo
Youngjae Yu
Stella Biderman
7
0
0
17 May 2025
Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning
Yuheng Lu
ZiMeng Bai
Caixia Yuan
Huixing Jiang
Xiaojie Wang
LRM
7
0
0
17 May 2025
Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features
Alex Heyman
Joel Zylberberg
ReLM
HILM
LRM
2
0
0
17 May 2025
TinyRS-R1: Compact Multimodal Language Model for Remote Sensing
Aybora Koksal
A. Aydin Alatan
LRM
7
0
0
17 May 2025
IQBench: How "Smart'' Are Vision-Language Models? A Study with Human IQ Tests
Tan-Hanh Pham
Phu-Vinh Nguyen
Dang The Hung
Bui Trong Duong
Vu Nguyen Thanh
Chris Ngo
Tri Truong
Truong-Son Hy
ReLM
CoGe
VLM
LRM
7
0
0
17 May 2025
Evaluating the Logical Reasoning Abilities of Large Reasoning Models
Hanmeng Liu
Yiran Ding
Zhizhang Fu
Chaoli Zhang
Xiaozhang Liu
Yue Zhang
ELM
LRM
2
0
0
17 May 2025
CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
Hongbo Jin
Ruyang Liu
Wenhao Zhang
Guibo Luo
Ge Li
LRM
4
0
0
17 May 2025
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
Jiarui Wang
Huiyu Duan
Ziheng Jia
Yu Zhao
Woo Yi Yang
...
Z. Chen
Juntong Wang
Yuke Xing
Guangtao Zhai
Xiongkuo Min
VGen
12
0
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
25
0
0
17 May 2025
Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission
Seungeun Oh
Jinhyuk Kim
Jihong Park
Seung-Woo Ko
Jinho Choi
Tony Q. S. Quek
Seong-Lyun Kim
9
0
0
17 May 2025
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Fanqi Lin
Ruiqian Nai
Yingdong Hu
Jiacheng You
Junming Zhao
Yang Gao
LRM
4
0
0
17 May 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Falong Fan
Xi Li
LLMAG
AAML
10
0
0
16 May 2025
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
Xueliang Wang
Ziyi Zhao
Siyu Ren
Shao Zhang
Song Li
...
Lin Qiu
Guanglu Wan
Xuezhi Cao
Xunliang Cai
Weinan Zhang
ALM
32
0
0
16 May 2025
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
Zhenwen Liang
Linfeng Song
Yang Li
Tao Yang
Feng Zhang
Haitao Mi
Dong Yu
LRM
17
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
9
0
0
16 May 2025
On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms
Jacob Trauger
Ambuj Tewari
17
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
22
0
0
16 May 2025
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&Ro
LRM
17
0
0
16 May 2025
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
Yige Xu
Xu Guo
Zhiwei Zeng
Chunyan Miao
BDL
LRM
17
0
0
16 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
25
0
0
16 May 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
21
0
0
16 May 2025
1
2
3
4
...
14
15
16
Next