OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training

Modern frameworks for training large foundation models (LFMs) employ dataloaders in a data-parallel manner, with each loader processing a disjoint subset of training data. Under multisource preprocessing, two fundamental challenges exist. First, due to the quadratic computational complexity of the attention operator, the non-uniform sample distribution over data-parallel ranks leads to significant workload imbalance among dataloaders, degrading the training efficiency. Second, supporting diverse data sources requires per-dataset file access states that are redundantly replicated across parallel loaders, consuming excessive memory. This also hinders dynamic data mixing (e.g., curriculum learning) and causes redundant access/memory overhead in hybrid parallelism.We present Omniload, an industrial-grade distributed data loading architecture for LFMs, with four innovations: (1) Disaggregated data preprocessing via role-specific actors (Source Loaders/Data Constructors) to eliminate source and parallelism redundant data access and ensure multisource scalability. (2) Centralized and declarative data plane for elastic multisource orchestration, such as long-short context, multimodality, and curriculum learning. (3) Multi-level auto-partitioning and scaling mechanism for source loaders under heterogeneous preprocessing costs. (4) Shadow loaders with differential checkpointing for fault recovery without workflow interruption. Deployed on production clusters scaling to multi-thousand GPUs, Omniload achieves: (1) 4.5x end-to-end training throughput improvement, (2) 13.5x reduction in CPU memory usage.
View on arXiv@article{zhao2025_2504.09844, title={ OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training }, author={ Juntao Zhao and Qi Lu and Wei Jia and Borui Wan and Lei Zuo and Junda Feng and Jianyu Jiang and Yangrui Chen and Shuaishuai Cao and Jialing He and Kaihua Jiang and Yuanzhe Hu and Shibiao Nong and Yanghua Peng and Haibin Lin and Xin Liu and Chuan Wu }, journal={arXiv preprint arXiv:2504.09844}, year={ 2025 } }