Title
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference Yujie Zhang Shivam Aggarwal T. Mitra MoE 167 1 0 16 Dec 2024
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar T. V. Rozendaal Romain Lepert Todor Boinovski M. V. Baalen Markus Nagel Paul N. Whatmough B. Bejnordi MoE 174 2 0 27 Nov 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts Weilin Cai Juyong Jiang Le Qin Junwei Cui Sunghun Kim Jiayi Huang 185 10 0 07 Apr 2024