ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.21135
59
0

MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness

27 March 2025
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
    MQ
    MoE
ArXivPDFHTML
Abstract

With the advances in artificial intelligence, Mix-of-Experts (MoE) has become the main form of Large Language Models (LLMs), and its demand for model compression is increasing. Quantization is an effective method that not only compresses the models but also significantly accelerates their performance. Existing quantization methods have gradually shifted the focus from parameter scaling to the analysis of data distributions. However, their analysis is designed for dense LLMs, which are suboptimal for MoE quantization, due to MoEs' complex data-model distribution. To address this problem, we decouple the complexity of MoEs' data-model distribution into a multi-stage analysis and reveal MoEs' inherent dynamics. The analysis results show that the expert performance of MoE varies dynamically both within and across data distributions. Based on these, we design two quantization strategies with data-model distribution awareness and integrate them into an end-to-end framework for MoE quantization, which is named MoQa. MoQa uses an expert-level mix-precision base quantization with distribution awareness. Moreover, MoQa uses a channel-level quantization adjustment to dynamically adjust expert performance to adapt to novel distributions. Experiments show that MoQa's base quantization achieves a 0.49~8.51 PPL decrease on known distributions. With the adjustments, MoQa achieves a 2.74~6.44 PPL decrease and 1.85%~3.77% average accuracy improvements on novel distributions. We believe MoQa will play a role in future MoE construction, optimization, and compression.

View on arXiv
@article{zheng2025_2503.21135,
  title={ MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness },
  author={ Zihao Zheng and Xiuping Cui and Size Zheng and Maoliang Li and Jiayu Chen and Yun Liang and Xiang Chen },
  journal={arXiv preprint arXiv:2503.21135},
  year={ 2025 }
}
Comments on this paper