AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for
Efficient MoE Inference

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

19 August 2024

Ru Huang

Meng Li

Papers citing "AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference"

3 / 3 papers shown

Title
$D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving$ D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang Qihua Zhou Zicong Hong Song Guo MoE 58 0 0 17 Apr 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling Yan Li Pengfei Zheng Shuang Chen Zewei Xu Yuanhao Lai Yunfei Du Z. Wang MoE 137 0 0 06 Mar 2025
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference Yujie Zhang Shivam Aggarwal T. Mitra MoE 74 0 0 16 Dec 2024