SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and
Scalable Large Mixture-of-Experts Models

v1v2 (latest)

SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

29 October 2023

Yiran Chen

ArXiv (abs)PDF HTML Github (17★)

Papers citing "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

3 / 3 papers shown

Title
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference Yujie Zhang Shivam Aggarwal T. Mitra MoE 167 1 0 16 Dec 2024
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar T. V. Rozendaal Romain Lepert Todor Boinovski M. V. Baalen Markus Nagel Paul N. Whatmough B. Bejnordi MoE 174 2 0 27 Nov 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts Weilin Cai Juyong Jiang Le Qin Junwei Cui Sunghun Kim Jiayi Huang 185 10 0 07 Apr 2024

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.