Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Xiaokang Zhang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
R. Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 830 papers shown
Title
A Survey on Unlearnable Data
Jiahao Li
Yiqiang Chen
Yunbing Xing
Yang Gu
Xiangyuan Lan
AAML
60
0
0
30 Mar 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
56
0
0
30 Mar 2025
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models
Sid Bharthulwar
John Rho
Katrina Brown
ReLM
VLM
LRM
61
0
0
30 Mar 2025
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva
Andrew Zisserman
59
0
0
30 Mar 2025
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
Nicholas Roth
Christopher Hidey
Lucas Spangher
William Arnold
Chang Ye
Nick Masiewicki
Jinoo Baek
Peter Grabowski
Eugene Ie
LLMAG
63
0
0
29 Mar 2025
A Retrieval-Augmented Knowledge Mining Method with Deep Thinking LLMs for Biomedical Research and Clinical Support
Yichun Feng
Jiawei Wang
Ruikun He
Lu Zhou
Yixue Li
RALM
62
1
0
29 Mar 2025
Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models
Zehua Liu
Han Wu
Ruifeng She
Xiaojin Fu
Xiongwei Han
Tao Zhong
Mingxuan Yuan
MoE
47
0
0
29 Mar 2025
Efficient Inference for Large Reasoning Models: A Survey
Yi Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
LLMAG
LRM
76
8
0
29 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
59
0
0
29 Mar 2025
Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge
Senkang Hu
Yanan Ma
Yihang Tao
Zhengru Fang
Zihan Fang
Yiqin Deng
Sam Kwong
Yuguang Fang
33
0
0
29 Mar 2025
DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation
Hsin-Ling Hsu
Jengnan Tzeng
44
0
0
29 Mar 2025
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Tuo Liang
Zhe Hu
Jing Li
Hao Zhang
Yiren Lu
...
Yiran Qiao
Disheng Liu
Jeirui Peng
Jing Ma
Yu Yin
59
0
0
29 Mar 2025
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Ruoxi Sun
Xingchen Wan
Hailong Li
Azalia Mirhoseini
Amin Saberi
Sercan O. Arik
ReLM
AI4TS
LRM
46
6
0
29 Mar 2025
FRASE: Structured Representations for Generalizable SPARQL Query Generation
Papa Abdou Karim Karou Diallo
Amal Zouaq
49
0
0
28 Mar 2025
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen
Guanlin Liu
Zheng Wu
Ruofei Zhu
Qingping Yang
Chao Xin
Yu Yue
Lin Yan
92
8
0
28 Mar 2025
REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation
Puzhen Yuan
Angyuan Ma
Yunchao Yao
Huaxiu Yao
Masayoshi Tomizuka
Mingyu Ding
LM&Ro
69
1
0
28 Mar 2025
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Weiqi Li
X. Zhang
Shijie Zhao
Yuyao Zhang
Junlin Li
Li Zhang
Jian Zhang
52
4
0
28 Mar 2025
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
60
0
0
28 Mar 2025
PharmAgents: Building a Virtual Pharma with Large Language Model Agents
B. Gao
Yanwen Huang
Yiqiao Liu
Wenxuan Xie
Wei-Ying Ma
Ya Zhang
Yanyan Lan
LLMAG
LM&Ro
91
0
0
28 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
243
0
0
28 Mar 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
53
0
0
28 Mar 2025
RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack
Weichen Dai
Zijie Dai
Zhijie Huang
Yixuan Pan
Xinhe Li
Xi Li
Yi Zhou
Ji Qi
Wu Jiang
29
0
0
28 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
Yangqiu Song
Zonghao Guo
Yibing Wang
Tianshuo Peng
Jian Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TS
SyDa
LRM
56
20
0
27 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Wenbo Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
80
4
0
27 Mar 2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov
Jasper Dekoninck
Lyuben Baltadzhiev
Maria Drencheva
Kristian Minchev
Mislav Balunović
Nikola Jovanović
Martin Vechev
LRM
ELM
72
11
0
27 Mar 2025
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
81
2
0
27 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Jike Zhong
Qilong Wu
Xinyue Li
Bo Zhang
Ming Li
...
Haoyang Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
50
0
0
27 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
67
2
0
27 Mar 2025
Unlocking the Potential of Past Research: Using Generative AI to Reconstruct Healthcare Simulation Models
Thomas Monks
Alison Harper
Amy Heather
48
0
0
27 Mar 2025
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
Zhengxi Lu
Yuxiang Chai
Yaxuan Guo
Xi Yin
Liang Liu
Hao Wang
Han Xiao
Shuai Ren
Guanjing Xiong
Haoyang Li
LLMAG
LRM
89
9
0
27 Mar 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
79
0
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
97
7
0
26 Mar 2025
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
Soham Sane
MoE
67
0
0
26 Mar 2025
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Salaheddin Alzubi
Creston Brooks
Purva Chiniya
Edoardo Contente
Chiara von Gerlach
...
Arda Kaz
Windsor Nguyen
Sewoong Oh
Himanshu Tyagi
Pramod Viswanath
VLM
ELM
LRM
96
4
0
26 Mar 2025
A multi-agentic framework for real-time, autonomous freeform metasurface design
Robert Lupoiu
Yixuan Shao
Tianxiang Dai
Chenkai Mao
Kofi Edee
Jonathan A. Fan
AI4CE
73
0
0
26 Mar 2025
Cyborg Data: Merging Human with AI Generated Training Data
Kai North
Christopher Ormerod
37
0
0
26 Mar 2025
RALLRec+: Retrieval Augmented Large Language Model Recommendation with Reasoning
Sichun Luo
Jian Xu
Xinsong Zhang
Linrong Wang
Sicong Liu
Hanxu Hou
Linqi Song
RALM
3DV
LRM
84
0
0
26 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRL
LRM
73
61
0
26 Mar 2025
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI
Alejandro Lozano
Min Woo Sun
James Burgess
Jeffrey Nirschl
Christopher Polzak
...
Xiaohan Wang
Alfred Seunghoon Song
Chiang Chia-Chun
Robert Tibshirani
Serena Yeung-Levy
LM&MA
102
1
0
26 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
216
3
0
26 Mar 2025
Can Large Language Models Predict Associations Among Human Attitudes?
Ana Ma
Derek Powell
39
0
0
26 Mar 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELM
LRM
61
2
0
25 Mar 2025
Learning to chain-of-thought with Jensen's evidence lower bound
Yunhao Tang
Sid Wang
Rémi Munos
BDL
OffRL
LRM
55
0
0
25 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
46
3
0
25 Mar 2025
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Yuyao Ge
Shenghua Liu
Yansen Wang
Lingrui Mei
Lizhe Chen
Baolong Bi
Xueqi Cheng
ReLM
LRM
54
3
0
25 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
58
2
0
25 Mar 2025
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yunjie Ji
Yiping Peng
Han Zhao
Xiangang Li
ReLM
ELM
LRM
59
6
0
25 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
64
2
0
25 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
88
5
0
25 Mar 2025
Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees
Gollam Rabby
Diyana Muhammed
Prasenjit Mitra
Sören Auer
40
1
0
25 Mar 2025
Previous
1
2
3
...
10
11
12
...
15
16
17
Next