Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12948
Cited By
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
22 January 2025
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
Ruoyu Zhang
Ran Xu
Qihao Zhu
Shirong Ma
P. Wang
Xiao Bi
Yanling Wang
X. Yu
Yu-Huan Wu
Z. F. Wu
Zhibin Gou
Z. Shao
Zhuoshu Li
Zijian Gao
Aixin Liu
Bing Xue
Bingxuan Wang
Bochao Wu
B. Feng
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenyi Zhang
Chong Ruan
Damai Dai
Deli Chen
Dongjie Ji
Erhang Li
F. Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guozhang Li
Han Zhang
Han Bao
Hanwei Xu
Han Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jianfei Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jianfeng Cai
Jiaqi Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
L. Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Peng Zhang
Qian Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Rongpin Wang
Ruoxin Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Tian Jin
Shengfeng Ye
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
50 / 1,327 papers shown
Title
Shape and Texture Recognition in Large Vision-Language Models
Sagi Eppel
Mor Bismut
Alona Faktor
3DV
VLM
114
2
0
29 Mar 2025
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
Nicholas Roth
Christopher Hidey
Lucas Spangher
William Arnold
Chang Ye
Nick Masiewicki
Jinoo Baek
Peter Grabowski
Eugene Ie
LLMAG
141
0
0
29 Mar 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
152
0
0
28 Mar 2025
PharmAgents: Building a Virtual Pharma with Large Language Model Agents
B. Gao
Yanwen Huang
Yiqiao Liu
Wenxuan Xie
Wei-Ying Ma
Ya Zhang
Yanyan Lan
LLMAG
LM&Ro
164
3
0
28 Mar 2025
FRASE: Structured Representations for Generalizable SPARQL Query Generation
Papa Abdou Karim Karou Diallo
Payel Das
79
0
0
28 Mar 2025
REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation
Puzhen Yuan
Angyuan Ma
Yunchao Yao
Huaxiu Yao
Masayoshi Tomizuka
Mingyu Ding
LM&Ro
136
3
0
28 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
509
0
0
28 Mar 2025
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Zhaocheng Zhu
Xuan Li
Mikhail Galkin
Xiao Feng
Sanmi Koyejo
Jian Tang
Bo Han
LRM
181
6
0
28 Mar 2025
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
193
0
0
28 Mar 2025
RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack
Weichen Dai
Zijie Dai
Zhijie Huang
Yixuan Pan
Xinhe Li
Xi Li
Yi Zhou
Ji Qi
Wu Jiang
63
0
0
28 Mar 2025
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen
Guanlin Liu
Zheng Wu
Ruofei Zhu
Qingping Yang
Chao Xin
Yu Yue
Lin Yan
169
14
0
28 Mar 2025
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Weiqi Li
Xinyu Zhang
Shijie Zhao
Yize Zhang
Junlin Li
Li Zhang
Jian Zhang
123
11
0
28 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
209
9
0
27 Mar 2025
Unlocking the Potential of Past Research: Using Generative AI to Reconstruct Healthcare Simulation Models
Thomas Monks
Alison Harper
Amy Heather
94
0
0
27 Mar 2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov
Jasper Dekoninck
Lyuben Baltadzhiev
Maria Drencheva
Kristian Minchev
Mislav Balunović
Nikola Jovanović
Martin Vechev
LRM
ELM
149
23
0
27 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Jike Zhong
Qilong Wu
Xinyue Li
Bo Zhang
Ming Li
...
Haoyang Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
89
1
0
27 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
118
3
0
27 Mar 2025
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
140
2
0
27 Mar 2025
Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1
Birger Moëll
Fredrik Sand Aronsson
Sanian Akbar
ELM
LRM
72
1
0
27 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
Yangqiu Song
Zonghao Guo
Yibing Wang
Tianshuo Peng
Jian Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TS
SyDa
LRM
197
62
0
27 Mar 2025
Can Large Language Models Predict Associations Among Human Attitudes?
Ana Ma
Derek Powell
96
0
0
26 Mar 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
163
0
0
26 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Jialin Li
OffRL
LRM
248
172
0
26 Mar 2025
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI
Alejandro Lozano
Min Woo Sun
James Burgess
Jeffrey Nirschl
Christopher Polzak
...
Xiaohan Wang
Alfred Seunghoon Song
Chiang Chia-Chun
Robert Tibshirani
Serena Yeung-Levy
LM&MA
182
2
0
26 Mar 2025
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Salaheddin Alzubi
Creston Brooks
Purva Chiniya
Edoardo Contente
Chiara von Gerlach
...
Arda Kaz
Windsor Nguyen
Sewoong Oh
Himanshu Tyagi
Pramod Viswanath
VLM
ELM
LRM
196
12
0
26 Mar 2025
A multi-agentic framework for real-time, autonomous freeform metasurface design
Robert Lupoiu
Yixuan Shao
Tianxiang Dai
Chenkai Mao
Kofi Edee
Jonathan A. Fan
AI4CE
110
1
0
26 Mar 2025
Cyborg Data: Merging Human with AI Generated Training Data
Kai North
Christopher Ormerod
72
0
0
26 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
442
4
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
224
0
0
26 Mar 2025
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
Soham Sane
MoE
102
0
0
26 Mar 2025
RALLRec+: Retrieval Augmented Large Language Model Recommendation with Reasoning
Sichun Luo
Jian Xu
Xinsong Zhang
Linrong Wang
Sicong Liu
Hanxu Hou
Linqi Song
RALM
3DV
LRM
124
0
0
26 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
106
4
0
25 Mar 2025
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Yuyao Ge
Shenghua Liu
Yansen Wang
Lingrui Mei
Lizhe Chen
Baolong Bi
Xueqi Cheng
ReLM
LRM
101
6
0
25 Mar 2025
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yunjie Ji
Yiping Peng
Han Zhao
Xiangang Li
ReLM
ELM
LRM
117
10
0
25 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
108
4
0
25 Mar 2025
Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees
Gollam Rabby
Diyana Muhammed
Prasenjit Mitra
Sören Auer
87
2
0
25 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
Julia Hockenmaier
Graham Neubig
Sean Welleck
LRM
110
5
0
25 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
185
11
0
25 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
179
6
0
25 Mar 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELM
LRM
163
9
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
181
3
0
25 Mar 2025
Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective
Changlun Li
Yao Shi
Yuyu Luo
Nan Tang
AIFin
115
0
0
24 Mar 2025
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Zhenyu Pan
Han Liu
OffRL
LRM
148
7
0
24 Mar 2025
RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation
Xiaolong Yin
Xingyu Lu
Jiahang Shen
Jingzhe Ni
Hailong Li
Ruofeng Tong
Min Tang
Peng Du
3DV
87
2
0
24 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Fan Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
160
7
0
24 Mar 2025
Evolutionary Policy Optimization
Jianren Wang
Yifan Su
Abhinav Gupta
Deepak Pathak
92
0
0
24 Mar 2025
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li
Rushi Qiang
Lama Moukheiber
Chao Zhang
LRM
95
3
0
24 Mar 2025
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Weixiang Zhao
Xingyu Sui
Jiahe Guo
Yulin Hu
Yang Deng
Yanyan Zhao
Bing Qin
Wanxiang Che
Tat-Seng Chua
Ting Liu
ELM
LRM
132
9
0
23 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
191
8
0
23 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
157
5
0
23 Mar 2025
Previous
1
2
3
...
20
21
22
...
25
26
27
Next