Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform
Jay Roberts
Kyle Mylonakis
Sidhartha Roy
Kaan Kale
77
0
0
11 Jun 2025
SANGAM: SystemVerilog Assertion Generation via Monte Carlo Tree Self-Refine
Adarsh Gupta
Bhabesh Mali
C. Karfa
LRM
19
0
0
11 Jun 2025
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang
Yuwei An
Hongyi Liu
Tianqi Chen
Beidi Chen
SyDa
LRM
189
0
0
11 Jun 2025
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Yu Sun
Xingyu Qian
Weiwen Xu
Hao Zhang
Chenghao Xiao
Long Li
Yu Rong
Wenbing Huang
Qifeng Bai
Tingyang Xu
LRM
79
0
0
11 Jun 2025
KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs
Dingjun Wu
Y. Yan
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
73
0
0
11 Jun 2025
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
Zijie Wu
Chaohui Yu
Fan Wang
Xiang Bai
AI4CE
65
0
0
11 Jun 2025
Know What You Don't Know: Uncertainty Calibration of Process Reward Models
Young-Jin Park
Kristjan Greenewald
Kaveh Alim
Hao Wang
Navid Azizan
LRM
71
0
0
11 Jun 2025
One Patient, Many Contexts: Scaling Medical AI Through Contextual Intelligence
Michelle M. Li
Ben Y. Reis
Adam Rodman
Tianxi Cai
Noa Dagan
Ran D. Balicer
J. Loscalzo
I. Kohane
Marinka Zitnik
LRM
61
0
0
11 Jun 2025
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
Yuki Imajuku
Kohki Horie
Yoichi Iwata
Kensho Aoki
Naohiro Takahashi
Takuya Akiba
25
0
0
10 Jun 2025
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Polina Kirichenko
Mark Ibrahim
Kamalika Chaudhuri
Samuel J. Bell
LRM
27
0
0
10 Jun 2025
Reinforcement Learning Teachers of Test Time Scaling
Edoardo Cetin
Tianyu Zhao
Yujin Tang
OffRL
ReLM
LRM
70
0
0
10 Jun 2025
Stronger Language Models Produce More Human-Like Errors
Andrew Keenan Richardson
Ryan Othniel Kearns
Sean Moss
Vincent Wang-Ma'scianica
Philipp Koralus
ReLM
LRM
26
0
0
10 Jun 2025
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Luel Hagos Beyene
Vivek Verma
Min Ma
Jesujoba Oluwadara Alabi
Fabian David Schmidt
Joyce Nakatumba-Nabende
David Ifeoluwa Adelani
59
0
0
10 Jun 2025
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
Sunil Kumar
Bowen Zhao
Leo Parker Dirac
Paulina Varshavskaya
LRM
32
0
0
10 Jun 2025
ThinkQE: Query Expansion via an Evolving Thinking Process
Yibin Lei
Tao Shen
Andrew Yates
ReLM
LRM
45
0
0
10 Jun 2025
ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
Amirreza Rouhi
Solmaz Arezoomandan
Knut Peterson
Joseph T. Woods
David Han
VLM
51
0
0
10 Jun 2025
TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning
Mingyu Zheng
Zhifan Feng
Jia Wang
Lanrui Wang
Zheng Lin
Yang Hao
Weiping Wang
LMTD
55
0
0
10 Jun 2025
DeepForm: Reasoning Large Language Model for Communication System Formulation
Panlong Wu
Ting Wang
Yifei Zhong
Haoqi Zhang
Zitong Wang
Fangxin Wang
OffRL
LRM
68
0
0
10 Jun 2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao
Shuming Guo
Shijie Cao
Yuqing Xia
Yu Cheng
...
Hayden Kwok-Hay So
Yu Hua
Ting Cao
Fan Yang
Mao Yang
VLM
LRM
45
0
0
10 Jun 2025
Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK
Carlos Garcia-Fernandez
Luis Felipe
Monique Shotande
Muntasir Zitu
Aakash Tripathi
Ghulam Rasool
Issam El Naqa
Vivek Rudrapatna
Gilmer Valdes
HILM
LM&MA
29
0
0
10 Jun 2025
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
Zengjue Chen
Runliang Niu
He Kong
Qi Wang
68
0
0
10 Jun 2025
TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration
Weiya Li
Junjie Chen
Bei Li
Boyang Liu
Zichen Wen
...
Xiaoqian Liu
Anping Liu
Huajie Liu
Hu Song
Linfeng Zhang
LLMAG
42
0
0
10 Jun 2025
ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption
Sungwoong Yune
Hyojeong Lee
Adiwena Putra
Hyunjun Cho
Cuong Duong Manh
Jaeho Jeon
Joo-Young Kim
28
0
0
10 Jun 2025
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Amrith Rajagopal Setlur
Matthew Y. R. Yang
Charlie Snell
Jeremy Greer
Ian Wu
Virginia Smith
Max Simchowitz
Aviral Kumar
LRM
55
0
0
10 Jun 2025
How to Provably Improve Return Conditioned Supervised Learning?
Zhishuai Liu
Yu Yang
Ruhan Wang
Pan Xu
Dongruo Zhou
OffRL
41
0
0
10 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
48
0
0
10 Jun 2025
Can Artificial Intelligence Write Like Borges? An Evaluation Protocol for Spanish Microfiction
Gerardo Aleman Manzanarez
Nora de la Cruz Arana
Jorge Garcia Flores
Yobany Garcia Medina
Raul Monroy
Nathalie Pernelle
22
0
0
09 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
50
0
0
09 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAML
LRM
39
0
0
09 Jun 2025
SEED: Enhancing Text-to-SQL Performance and Practical Usability Through Automatic Evidence Generation
Janghyeon Yun
Sang-goo Lee
17
0
0
09 Jun 2025
OpenDance: Multimodal Controllable 3D Dance Generation Using Large-scale Internet Data
Jinlu Zhang
Zixi Kang
Yizhou Wang
36
0
0
09 Jun 2025
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal
Reza Shirkavand
Heng-Chiao Huang
Gowthami Somepalli
Tom Goldstein
49
0
0
09 Jun 2025
Towards a Small Language Model Lifecycle Framework
Parsa Miraghaei
Sergio Moreschini
Antti Kolehmainen
David Hästbacka
20
0
0
09 Jun 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
18
0
0
09 Jun 2025
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
Shijie Wang
Yilun Zhang
Zeyu Lai
Dexing Kong
36
0
0
09 Jun 2025
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Junhong Shen
Hao Bai
Lunjun Zhang
Yifei Zhou
Amrith Rajagopal Setlur
...
Diego Caples
Nan Jiang
Tong Zhang
Ameet Talwalkar
Aviral Kumar
LLMAG
LRM
34
0
0
09 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
38
0
0
09 Jun 2025
Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
Haotong Qin
Cheng Hu
Michele Magno
VLM
31
0
0
09 Jun 2025
Improving Large Language Models with Concept-Aware Fine-Tuning
Michael K. Chen
Xikun Zhang
Jiaxing Huang
Dacheng Tao
31
0
0
09 Jun 2025
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Penghao Wu
Shengnan Ma
Bo Wang
Jiaheng Yu
Lewei Lu
Ziwei Liu
36
0
0
09 Jun 2025
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Lu Ma
Hao Liang
Meiyi Qiang
Lexiang Tang
Xiaochen Ma
...
Junbo Niu
Chengyu Shen
Runming He
Bin Cui
Wentao Zhang
ReLM
OffRL
LRM
38
0
0
09 Jun 2025
LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Yixuan Yang
Zhen Luo
Tongsheng Ding
Junru Lu
Mingqi Gao
Jinyu Yang
Victor Sanchez
Feng Zheng
3DV
36
0
0
09 Jun 2025
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Roy Eisenstadt
Itamar Zimerman
Lior Wolf
LRM
28
0
0
08 Jun 2025
HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance
Lei Li
Angela Dai
34
0
0
08 Jun 2025
Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings
Rong-Xi Tan
Ming Chen
Ke Xue
Yao Wang
Yaoyuan Wang
Sheng Fu
Chao Qian
OffRL
37
0
0
08 Jun 2025
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation
Jaechul Roh
Varun Gandhi
Shivani Anilkumar
Arin Garg
AAML
ReLM
LRM
48
0
0
08 Jun 2025
QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine
Anushka Jha
Tanushree Dewangan
Mukul Lokhande
Santosh Kumar Vishvakarma
35
0
0
08 Jun 2025
SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows
Rebecca Saul
Hao Wang
Koushik Sen
David Wagner
LLMAG
20
0
0
08 Jun 2025
How Far Are We from Optimal Reasoning Efficiency?
Jiaxuan Gao
Shu Yan
Qixin Tan
Lu Yang
Shusheng Xu
Wei Fu
Zhiyu Mei
Kaifeng Lyu
Yi Wu
LRM
37
0
0
08 Jun 2025
DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains
Zhihui Chen
Kai He
Yucheng Huang
Yunxiao Zhu
Mengling Feng
DeLMO
MedIm
35
0
0
07 Jun 2025
Previous
1
2
3
4
5
6
...
25
26
27
Next