VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
- MoEVLM

Main:9 Pages
7 Figures
Bibliography:5 Pages
4 Tables
Appendix:9 Pages
Abstract
Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. %
View on arXivComments on this paper