Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
Zhenwen Liang
Linfeng Song
Yang Li
Tao Yang
Feng Zhang
Haitao Mi
Dong Yu
LRM
150
2
0
16 May 2025
CAMEO: Collection of Multilingual Emotional Speech Corpora
Iwona Christop
Maciej Czajka
115
1
0
16 May 2025
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huatian Zhang
Dimitris N. Metaxas
Hao Wang
LRM
129
0
0
16 May 2025
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese
Xinyu Wang
Ziyi Zhao
Siyu Ren
Shao Zhang
Song Li
...
Lin Qiu
Guanglu Wan
Xuezhi Cao
Xunliang Cai
Weinan Zhang
ALM
127
0
0
16 May 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
103
2
0
16 May 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Falong Fan
Xi Li
LLMAG
AAML
97
0
0
16 May 2025
HessFormer: Hessians at Foundation Scale
Diego Granziol
127
0
0
16 May 2025
On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms
Jacob Trauger
Ambuj Tewari
72
0
0
16 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
168
0
0
16 May 2025
REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
Pawin Taechoyotin
Daniel Acuna
LRM
94
0
0
16 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
133
2
0
16 May 2025
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
Xiaomin Li
Mingye Gao
Yuexing Hao
Taoran Li
Guangya Wan
Zihan Wang
Yijun Wang
LM&MA
ELM
AI4MH
148
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
167
2
0
16 May 2025
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Anjiang Wei
Tarun Suresh
Huanmi Tan
Yinglun Xu
Gagandeep Singh
Ke Wang
Alex Aiken
80
0
0
16 May 2025
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
Zijia Liu
Peixuan Han
Haofei Yu
Haoru Li
Jiaxuan You
AI4TS
LRM
209
1
0
16 May 2025
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
Sijia Chen
Xiaomin Li
Mengxue Zhang
Eric Hanchen Jiang
Qingcheng Zeng
Chen-Hsiang Yu
AAML
MU
ELM
144
0
0
16 May 2025
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
Yige Xu
Xu Guo
Zhiwei Zeng
Chunyan Miao
BDL
LRM
156
1
0
16 May 2025
VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification
Patrick Yubeaton
Andre Nakkab
Weihua Xiao
Luca Collini
Ramesh Karri
Chinmay Hegde
Siddharth Garg
LRM
39
1
0
16 May 2025
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Jianchao Tan
ELM
LRM
169
0
0
16 May 2025
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
Chengyu Huang
Zhengxin Zhang
Claire Cardie
LRM
136
0
0
16 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
179
0
0
16 May 2025
CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks
Christoph Leiter
Yuki M. Asano
Margret Keuper
Steffen Eger
64
0
0
16 May 2025
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Hao Mark Chen
Guanxi Lu
Yasuyuki Okoshi
Zhiwen Mo
Masato Motomura
Hongxiang Fan
LRM
130
0
0
16 May 2025
Noise Injection Systemically Degrades Large Language Model Safety Guardrails
Prithviraj Singh Shahani
Matthias Scheutz
AAML
125
0
0
16 May 2025
Superposition Yields Robust Neural Scaling
Yizhou Liu
Ziming Liu
Jeff Gore
MILM
170
1
0
15 May 2025
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
118
2
0
15 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Yansen Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
111
2
0
15 May 2025
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffM
LRM
AI4CE
108
2
0
15 May 2025
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data
Adel ElZemity
Budi Arief
Shujun Li
89
0
0
15 May 2025
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
Chengsen Wang
Qi Qi
Zhongwen Rao
Lujia Pan
Jingyu Wang
Jianxin Liao
AI4TS
75
0
0
15 May 2025
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks
Ziyuan Zhang
Darcy Wang
Ningyuan Chen
Rodrigo Mansur
Vahid Sarhangian
182
0
0
15 May 2025
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Seongyun Lee
Seungone Kim
Minju Seo
Yongrae Jo
Dongyoung Go
...
Xiang Yue
Sean Welleck
Graham Neubig
Moontae Lee
Minjoon Seo
LRM
114
1
0
15 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
110
6
0
15 May 2025
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Hongjin Qian
Zhengyang Liang
RALM
LRM
177
0
0
14 May 2025
Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits
Subrit Dikshit
Ritu Tiwari
Priyank Jain
77
0
0
14 May 2025
A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism
Jinqiang Wang
Huansheng Ning
Tao Zhu
Jianguo Ding
88
0
0
14 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
111
0
0
14 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
139
100
0
14 May 2025
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
Nidhal Jegham
Marwen Abdelatti
Lassad Elmoubarki
Abdeltawab Hendawi
94
0
0
14 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
129
0
0
14 May 2025
InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials
Xiao-Qi Han
Peng-Jie Guo
Ze-Feng Gao
Hao Sun
Zhong-Yi Lu
AI4CE
71
0
0
14 May 2025
Memorization-Compression Cycles Improve Generalization
Fangyuan Yu
83
0
0
13 May 2025
Evaluating the Effectiveness of Black-Box Prompt Optimization as the Scale of LLMs Continues to Grow
Ziyu Zhou
Yihang Wu
J. Yang
Zhan Xiao
Rongjun Li
LRM
70
0
0
13 May 2025
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Xiaoyang Chen
Xinan Dai
Yu Du
Qian Feng
Naixu Guo
...
Jinfeng Xu
Yiyang Yu
Zhiyong Yang
Hongji Zha
Ruichong Zhang
LRM
72
1
0
13 May 2025
LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
K M Sajjadul Islam
Ayesha Siddika Nipu
Jiawei Wu
Praveen Madiraju
130
0
0
13 May 2025
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs
Pencuo Zeren
Qiuming Luo
Rui Mao
Chang Kong
33
0
0
13 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
101
11
0
13 May 2025
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Ben Yao
Qiuchi Li
Yazhou Zhang
Siyu Yang
Bohan Zhang
Prayag Tiwari
Jing Qin
130
0
0
13 May 2025
CellTypeAgent: Trustworthy cell type annotation with Large Language Models
Jiawen Chen
Jing Zhang
Huaxiu Yao
Yun Li
LLMAG
50
0
0
13 May 2025
Re
2
^2
2
: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions
Daoze Zhang
Zhijian Bao
S. Du
Zhiyi Zhao
Kuangling Zhang
Dezheng Bao
Yang Yang
63
1
0
12 May 2025
Previous
1
2
3
...
12
13
14
...
25
26
27
Next