ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.01016
  4. Cited By
MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert
  Pruning and Intra-Expert Low-Rank Decomposition

MoE-I2^22: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition

1 November 2024
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
Yuanlin Duan
Wenqi Jia
Miao Yin
Yu Cheng
Bo Yuan
    MoE
ArXivPDFHTML

Papers citing "MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition"

3 / 3 papers shown
Title
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
31
0
0
10 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
68
0
0
06 May 2025
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Xia Hu
Bo Yuan
VLM
64
0
0
24 Mar 2025
1