Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Z. Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Jianxin Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Renqi Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 788 papers shown
Title
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data
Adel ElZemity
Budi Arief
Shujun Li
37
0
0
15 May 2025
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffM
LRM
AI4CE
25
0
0
15 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
19
0
0
15 May 2025
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks
Ziyuan Zhang
Darcy Wang
Ningyuan Chen
Rodrigo Mansur
Vahid Sarhangian
24
0
0
15 May 2025
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
Chengsen Wang
Qi Qi
Zhongwen Rao
Lujia Pan
Jingyu Wang
Jianxin Liao
AI4TS
24
0
0
15 May 2025
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Seongyun Lee
Seungone Kim
Minju Seo
Yongrae Jo
Dongyoung Go
...
Xiang Yue
Sean Welleck
Graham Neubig
Moontae Lee
Minjoon Seo
LRM
31
0
0
15 May 2025
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
37
0
0
15 May 2025
InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials
Xiao-Qi Han
Peng-Jie Guo
Ze-Feng Gao
Hao Sun
Zhong-Yi Lu
AI4CE
30
0
0
14 May 2025
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Hongjin Qian
Zhengyang Liang
RALM
LRM
38
0
0
14 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yong Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
33
0
0
14 May 2025
A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism
Jinqiang Wang
Huansheng Ning
Tao Zhu
Jianguo Ding
7
0
0
14 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
44
0
0
14 May 2025
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
Nidhal Jegham
Marwen Abdelatti
Lassad Elmoubarki
Abdeltawab Hendawi
26
0
0
14 May 2025
Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits
Subrit Dikshit
Ritu Tiwari
Priyank Jain
24
0
0
14 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
50
0
0
14 May 2025
LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
K. M. S. Islam
A. S. Nipu
Jiawei Wu
Praveen Madiraju
46
0
0
13 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
31
0
0
13 May 2025
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Ben Yao
Qiuchi Li
Yazhou Zhang
Siyu Yang
Bohan Zhang
Prayag Tiwari
Jing Qin
36
0
0
13 May 2025
Evaluating the Effectiveness of Black-Box Prompt Optimization as the Scale of LLMs Continues to Grow
Ziyu Zhou
Yihang Wu
J. Yang
Zhan Xiao
Rongjun Li
LRM
34
0
0
13 May 2025
Memorization-Compression Cycles Improve Generalization
Fangyuan Yu
39
0
0
13 May 2025
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Xiaoyang Chen
Xinan Dai
Yu Du
Qian Feng
Naixu Guo
...
J. Xu
Yiyang Yu
Zhiyong Yang
Hongji Zha
Ruichong Zhang
LRM
39
1
0
13 May 2025
CellTypeAgent: Trustworthy cell type annotation with Large Language Models
Jiawen Chen
Jingyang Zhang
Huaxiu Yao
Yun Li
LLMAG
30
0
0
13 May 2025
SEM: Reinforcement Learning for Search-Efficient Large Language Models
Zeyang Sha
Shiwen Cui
Weiqiang Wang
KELM
OffRL
LRM
31
0
0
12 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
B. S.
Cici
Dawei Zhu
...
Yishuo Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
48
1
0
12 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
28
0
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
43
0
0
12 May 2025
MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
Aybora Koksal
A. Aydin Alatan
LRM
29
1
0
12 May 2025
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
29
0
0
12 May 2025
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang
Yuchen Zhuang
Yinghao Li
D. Kilman
Rongzhi Zhang
...
Ian Shu-Hei Wong
Sherry Yang
Percy Liang
Chao Zhang
Bo Dai
ELM
41
0
0
12 May 2025
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiacheng Chen
Samet Oymak
ReLM
LRM
45
0
0
12 May 2025
Re
2
^2
2
: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions
Daoze Zhang
Zhijian Bao
S. Du
Zhiyi Zhao
Kuangling Zhang
Dezheng Bao
Yang Yang
29
0
0
12 May 2025
Injecting Knowledge Graphs into Large Language Models
Erica Coppolillo
29
0
0
12 May 2025
Evaluating Large Language Models for Real-World Engineering Tasks
Rene Heesch
Sebastian Eilermann
Alexander Windmann
Alexander Diedrich
Philipp Rosenthal
Oliver Niggemann
ELM
5
0
0
12 May 2025
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang
Chris
Jiangbo Pei
Wei Shen
Yi Peng
...
Ai Jian
Tianyidan Xie
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
LRM
28
0
0
12 May 2025
REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction
Fei Liu
Huanhuan Ren
Yu Guan
Xiuxu Wang
Wang Lv
Zhiqiang Hu
Y. Chen
39
0
0
12 May 2025
Learning from Peers in Reasoning Models
Tongxu Luo
Wenyu Du
Jiaxi Bi
Stephen Chung
Zhengyang Tang
Hao Yang
M. Zhang
Benyou Wang
LRM
41
0
0
12 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
52
0
0
12 May 2025
Assessing the Chemical Intelligence of Large Language Models
Nicholas T. Runcie
Charlotte M. Deane
Fergus Imrie
ELM
LRM
40
0
0
12 May 2025
Implementing Long Text Style Transfer with LLMs through Dual-Layered Sentence and Paragraph Structure Extraction and Mapping
Yusen Wu
Xiaotie Deng
19
0
0
11 May 2025
Architectural Precedents for General Agents using Large Language Models
R. Wray
James R. Kirk
John E. Laird
LLMAG
AI4TS
AI4CE
31
0
0
11 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
Choong Seon Hong
AI4CE
36
0
0
11 May 2025
LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus
Peihan Li
Lifeng Zhou
23
0
0
10 May 2025
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
50
1
0
09 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Yu Wang
Haoyu Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
36
0
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
60
0
0
09 May 2025
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov
Farzan Farnia
Roozbeh Yousefzadeh
LRM
29
0
0
09 May 2025
Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought
Mingfei Zeng
Ming Xie
Xixi Zheng
Chunhai Li
Chuan Zhang
Liehuang Zhu
31
0
0
08 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
34
0
0
08 May 2025
GroverGPT-2: Simulating Grover's Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization
Min Chen
Jinglei Cheng
Pingzhi Li
Haoran Wang
Tianlong Chen
Junyu Liu
LRM
48
0
0
08 May 2025
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
48
0
0
08 May 2025
Previous
1
2
3
4
5
6
...
14
15
16
Next