Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.10284
Cited By
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
19 August 2024
Shuzhang Zhong
Ling Liang
Yuan Wang
Runsheng Wang
Ru Huang
Meng Li
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference"
3 / 3 papers shown
Title
D
2
^{2}
2
MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Haodong Wang
Qihua Zhou
Zicong Hong
Song Guo
MoE
58
0
0
17 Apr 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Zehao Wang
MoE
137
0
0
06 Mar 2025
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference
Yujie Zhang
Shivam Aggarwal
T. Mitra
MoE
76
0
0
16 Dec 2024
1