Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.14476
Cited By
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
18 March 2025
Qiying Yu
Zhe Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
Yu Yue
Tiantian Fan
Gaohong Liu
L. Liu
Xin Liu
H. Lin
Zhiqi Lin
Bole Ma
Guangming Sheng
Yuxuan Tong
Chenyi Zhang
Mofan Zhang
Wang Zhang
Hang Zhu
Jinhua Zhu
Jiaze Chen
Jiangjie Chen
Chunyang Wang
Hongli Yu
W. Dai
Yuxuan Song
Xiangpeng Wei
Hao Zhou
Jingjing Liu
Wei-Ying Ma
Ya-Qin Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DAPO: An Open-Source LLM Reinforcement Learning System at Scale"
50 / 60 papers shown
Title
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
26
0
0
21 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoE
LRM
AI4CE
12
0
0
21 May 2025
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
Bowen Jin
Jinsung Yoon
Priyanka Kargupta
Sercan Ö. Arik
Jiawei Han
LRM
14
0
0
21 May 2025
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data
Yuhang Zhou
Jing Zhu
Shengyi Qian
Zhuokai Zhao
Xiyao Wang
Xiaoyu Liu
Ming Li
Paiheng Xu
Wei Ai
Furong Huang
12
0
0
21 May 2025
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
OffRL
LRM
12
0
0
21 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yuanyuan Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
12
0
0
21 May 2025
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Jian Tang
Alexandre Piché
Nicolas Angelard-Gontier
Yoshua Bengio
Ehsan Kamalloo
ReLM
LRM
25
0
0
20 May 2025
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
23
0
0
20 May 2025
RLVR-World: Training World Models with Reinforcement Learning
Jialong Wu
Shaofeng Yin
Ningya Feng
Mingsheng Long
OffRL
VGen
17
0
0
20 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELM
LRM
14
0
0
19 May 2025
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Soumya Rani Samineni
Durgesh Kalwar
Karthik Valmeekam
Kaya Stechly
Subbarao Kambhampati
OffRL
14
0
0
19 May 2025
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
24
0
0
19 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
7
0
0
18 May 2025
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
17
0
0
18 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
16
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
17
0
0
18 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
12
0
0
17 May 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
21
0
0
16 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
22
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
17
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
22
0
0
16 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
Bo Shen
Cici
Dawei Zhu
...
Yishuo Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
48
2
0
12 May 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing
Xiaowei Hu
Chi-Wing Fu
Wei Wang
Jifeng Dai
Pheng-Ann Heng
MLLM
OffRL
VLM
LRM
55
0
0
07 May 2025
RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation
Keyu Chen
Wenchao Sun
Hao Cheng
Sifa Zheng
52
0
0
06 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
48
0
0
05 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
62
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
76
2
0
05 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Yiming Li
LRM
72
3
0
01 May 2025
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Haoran Xu
Baolin Peng
Hany Awadalla
DongDong Chen
Yen-Chun Chen
...
Yelong Shen
Shuaiqiang Wang
Weijian Xu
Jianfeng Gao
Weizhu Chen
ReLM
LRM
82
3
0
30 Apr 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Rongxiang Weng
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
1
0
30 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
131
9
0
29 Apr 2025
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets
Mingqian He
Fei Zhao
Chonggang Lu
Zichen Liu
Yishuo Wang
Haofu Qian
OffRL
AI4TS
VLM
72
0
0
28 Apr 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
W. L. Xiao
Leilei Gan
Weilong Dai
Wanggui He
Ziwei Huang
...
Fangxun Shu
Zhelun Yu
Peng Zhang
Hao Jiang
Fei Wu
ReLM
LRM
AI4CE
239
1
0
25 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Ganqu Cui
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
189
5
0
22 Apr 2025
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
Cheng Wen
Tingwei Guo
Shuaijiang Zhao
Wei Zou
Xiangang Li
OffRL
AuLLM
LRM
62
3
0
22 Apr 2025
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
64
0
0
22 Apr 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
X. Zhang
Rongxiang Weng
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
43
5
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
61
21
0
18 Apr 2025
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang
Shang Zhu
Jon Saad-Falcon
Ben Athiwaratkun
Qingyang Wu
Jue Wang
Shuaiwen Leon Song
Ce Zhang
Bhuwan Dhingra
James Y. Zou
LRM
53
1
0
18 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
44
2
0
18 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
212
1
0
17 Apr 2025
ToolRL: Reward is All Tool Learning Needs
Cheng Qian
Emre Can Acikgoz
Qi He
Hongru Wang
Xiusi Chen
Dilek Hakkani-Tur
Gokhan Tur
Heng Ji
OffRL
LRM
38
7
0
16 Apr 2025
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
196
4
0
15 Apr 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
...
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRL
LRM
48
10
0
15 Apr 2025
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
33
0
0
14 Apr 2025
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
33
2
0
14 Apr 2025
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
Zhaopeng Feng
Shaosheng Cao
Jiahan Ren
Jiayuan Su
Ruizhe Chen
Yan Zhang
Zhe Xu
Yao Hu
Jian Wu
Zuozhu Liu
ALM
LRM
63
3
0
14 Apr 2025
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
LRM
40
2
0
13 Apr 2025
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang
Xiang Yue
V. Chaudhary
Xiaotian Han
ReLM
LRM
78
2
0
12 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
177
2
0
10 Apr 2025
1
2
Next