ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.13345
70
0

Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference

19 May 2025
Shuqing Luo
Pingzhi Li
Jie Peng
Hanrui Wang
Yang
Zhao
Yu Cheng
Tianlong Chen
    MoE
ArXiv (abs)PDFHTML
Main:9 Pages
13 Figures
Bibliography:3 Pages
6 Tables
Appendix:7 Pages
Abstract

Mixture-of-experts (MoE) architectures could achieve impressive computational efficiency with expert parallelism, which relies heavily on all-to-all communication across devices. Unfortunately, such communication overhead typically constitutes a significant portion of the total runtime, hampering the scalability of distributed training and inference for modern MoE models (consuming over 40%40\%40% runtime in large-scale training). In this paper, we first define collaborative communication to illustrate this intrinsic limitation, and then propose system- and algorithm-level innovations to reduce communication costs. Specifically, given a pair of experts co-activated by one token, we call them "collaborated", which comprises 222 cases as intra- and inter-collaboration, depending on whether they are kept on the same device. Our pilot investigations reveal that augmenting the proportion of intra-collaboration can accelerate expert parallelism at scale. It motivates us to strategically optimize collaborative communication for accelerated MoE training and inference, dubbed Occult. Our designs are capable of either delivering exact results with reduced communication cost or controllably minimizing the cost with collaboration pruning, materialized by modified fine-tuning. Comprehensive experiments on various MoE-LLMs demonstrate that Occult can be faster than popular state-of-the-art inference or training frameworks (more than 1.5×1.5\times1.5× speed up across multiple tasks and models) with comparable or superior quality compared to the standard fine-tuning. Code is available at \href\href{this https URL}{this https URL}\href.

View on arXiv
@article{luo2025_2505.13345,
  title={ Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference },
  author={ Shuqing Luo and Pingzhi Li and Jie Peng and Hanrui Wang and Yang and Zhao and Yu Cheng and Tianlong Chen },
  journal={arXiv preprint arXiv:2505.13345},
  year={ 2025 }
}
Comments on this paper