Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.05897
Cited By
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
8 April 2025
Shuzhang Zhong
Yizhou Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
3 / 3 papers shown
Title
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
MQ
MoE
125
0
0
27 Mar 2025
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
127
6
0
29 Oct 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
143
24
0
10 Feb 2024
1