Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

6 June 2025

Main:12 Pages

11 Figures

Bibliography:4 Pages

Abstract

We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several key modules to serve these user groups effectively. First, a single-controller architecture combined with an abstraction of the parallel worker simplifies the development of the training pipeline. Second, the parallel strategy and data transfer modules enable efficient and scalable training. Third, the rollout scheduler offers fine-grained management of each sample's lifecycle during the rollout stage. Fourth, the environment worker and reward worker support rapid and flexible experimentation with agentic RL algorithms and reward designs. Finally, AutoDeviceMapping allows users to assign resources to different models flexibly across various stages.

View on arXiv

@article{wang2025_2506.06122,
  title={ Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library },
  author={ Weixun Wang and Shaopan Xiong and Gengru Chen and Wei Gao and Sheng Guo and Yancheng He and Ju Huang and Jiaheng Liu and Zhendong Li and Xiaoyang Li and Zichen Liu and Haizhou Zhao and Dakai An and Lunxi Cao and Qiyang Cao and Wanxi Deng and Feilei Du and Yiliang Gu and Jiahe Li and Xiang Li and Mingjie Liu and Yijia Luo and Zihe Liu and Yadao Wang and Pei Wang and Tianyuan Wu and Yanan Wu and Yuheng Zhao and Shuaibing Zhao and Jin Yang and Siran Yang and Yingshui Tan and Huimin Yi and Yuchi Xu and Yujin Yuan and Xingyao Zhang and Lin Qu and Wenbo Su and Wei Wang and Jiamang Wang and Bo Zheng },
  journal={arXiv preprint arXiv:2506.06122},
  year={ 2025 }
}

Comments on this paper