Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs

3 June 2025

Ze Yu Zhang

Bolin Ding

Bryan Kian Hsiang Low

MoE

ArXiv (abs)PDF HTML

Main:16 Pages

6 Figures

Bibliography:4 Pages

4 Tables

Abstract

Mixture-of-Experts (MoE) has been gaining popularity due to its successful adaptation to large language models (LLMs). In this work, we introduce Privacy-preserving Collaborative Mixture-of-Experts (PC-MoE), which leverages the sparsity of the MoE architecture for memory-efficient decentralized collaborative LLM training, enabling multiple parties with limited GPU-memory and data resources to collectively train more capable LLMs than they could achieve individually. At the same time, this approach protects training data privacy of each participant by keeping training data, as well as parts of the forward pass signal and gradients locally within each party. By design, PC-MoE synergistically combines the strengths of distributed computation with strong confidentiality assurances. Unlike most privacy-preserving schemes, which pay for confidentiality with lower task accuracy, our framework breaks that trade-off: across seven popular LLM benchmarks, it almost matches (and sometimes exceeds) the performance and convergence rate of a fully centralized model, enjoys near 70% peak GPU RAM reduction, while being fully robust against reconstruction attacks.

View on arXiv

@article{zhang2025_2506.02965,
  title={ PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs },
  author={ Ze Yu Zhang and Bolin Ding and Bryan Kian Hsiang Low },
  journal={arXiv preprint arXiv:2506.02965},
  year={ 2025 }
}

Comments on this paper