Recent advancements in reinforcement learning (RL) have shown promise for optimizing virtual machine scheduling (VMS) in small-scale clusters. The utilization of RL to large-scale cloud computing scenarios remains notably constrained. This paper introduces a scalable RL framework, called Cluster Value Decomposition Reinforcement Learning (CVD-RL), to surmount the scalability hurdles inherent in large-scale VMS. The CVD-RL framework innovatively combines a decomposition operator with a look-ahead operator to adeptly manage representation complexities, while complemented by a Top- filter operator that refines exploration efficiency. Different from existing approaches limited to clusters of or fewer physical machines (PMs), CVD-RL extends its applicability to environments encompassing up to PMs. Furthermore, the CVD-RL framework demonstrates generalization capabilities that surpass contemporary SOTA methodologies across a variety of scenarios in empirical studies. This breakthrough not only showcases the framework's exceptional scalability and performance but also represents a significant leap in the application of RL for VMS within complex, large-scale cloud infrastructures. The code is available atthis https URL.
View on arXiv@article{sheng2025_2503.00537, title={ Scalable Reinforcement Learning for Virtual Machine Scheduling }, author={ Junjie Sheng and Jiehao Wu and Haochuan Cui and Yiqiu Hu and Wenli Zhou and Lei Zhu and Qian Peng and Wenhao Li and Xiangfeng Wang }, journal={arXiv preprint arXiv:2503.00537}, year={ 2025 } }