ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05897
  4. Cited By
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

8 April 2025
Shuzhang Zhong
Yizhou Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
    MoE
ArXiv (abs)PDFHTML

Papers citing "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"

3 / 3 papers shown
Title
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
MQMoE
125
0
0
27 Mar 2025
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
127
6
0
29 Oct 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
143
24
0
10 Feb 2024
1