Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.05596
Cited By
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
14 January 2022
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale"
50 / 189 papers shown
Title
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
29
0
0
12 May 2025
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
31
0
0
10 May 2025
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
63
0
0
06 May 2025
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields
Zhenxing Mi
Ping Yin
Xue Xiao
Dan Xu
MoE
49
0
0
04 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
The Big Send-off: High Performance Collectives on GPU-based Supercomputers
Siddharth Singh
Mahua Singh
A. Bhatele
49
0
0
25 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
65
0
0
21 Apr 2025
Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review
Karthik Shivashankar
Ghadi S. Al Hajj
Antonio Martini
17
0
0
15 Apr 2025
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Zican Dong
Han Peng
Peiyu Liu
Wayne Xin Zhao
Dong Wu
Feng Xiao
Zhilin Wang
MoE
33
0
0
09 Apr 2025
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Yongji Wu
Xueshen Liu
Shuowei Jin
Ceyu Xu
Feng Qian
Ziming Mao
Matthew Lentz
Danyang Zhuo
Ion Stoica
MoMe
MoE
59
0
0
04 Apr 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQ
MoE
42
0
0
03 Apr 2025
Adaptive Layer-skipping in Pre-trained LLMs
Xuan Luo
Weizhi Wang
Xifeng Yan
149
0
0
31 Mar 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Steven G. McDonagh
Sarah Parisot
MoE
40
0
0
28 Mar 2025
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
Sashuai Zhou
Hai Huang
Yan Xia
MoMe
MoE
75
0
0
26 Mar 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
49
0
0
24 Mar 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
48
0
0
13 Mar 2025
Accelerating MoE Model Inference with Expert Sharding
Oana Balmau
Anne-Marie Kermarrec
Rafael Pires
André Loureiro Espírito Santo
M. Vos
Milos Vujasinovic
MoE
64
0
0
11 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
158
0
0
10 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
135
1
0
10 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
109
2
0
07 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Z. Wang
MoE
137
0
0
06 Mar 2025
Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien
Charles-Étienne Joseph
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Shri Kiran Srinivasan
Eugene Belilovsky
Irina Rish
MoE
75
0
0
06 Mar 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo
Xiaojian Liao
Limin Xiao
Li Ruan
Jinquan Wang
Xiao Su
Zhisheng Huo
67
0
0
04 Mar 2025
Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting
Xiaobin Hong
J. Zhang
Wenzhong Li
Sanglu Lu
Jiajun Li
AI4TS
68
0
0
03 Mar 2025
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
98
2
0
26 Feb 2025
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
Seokjin Go
Divya Mahajan
MoE
67
0
0
10 Feb 2025
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Zhiyuan Fang
Yuegui Huang
Zicong Hong
Yufeng Lyu
Wuhui Chen
Yue Yu
Fan Yu
Zibin Zheng
MoE
48
0
0
09 Feb 2025
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
Hanfei Yu
Xingqi Cui
H. M. Zhang
Hairu Wang
Hao Wang
MoE
61
0
0
07 Feb 2025
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
136
0
0
07 Feb 2025
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Zihan Qiu
Zeyu Huang
Jian Xu
Kaiyue Wen
Zekun Wang
Rui Men
Ivan Titov
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
57
6
0
21 Jan 2025
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Jingbo Lin
Zhilu Zhang
W. J. Li
Renjing Pei
Hang Xu
Hongzhi Zhang
Wangmeng Zuo
37
0
0
28 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation
Svetlana Pavlitska
Enrico Eisen
J. Marius Zöllner
AAML
UQCV
MoE
92
1
0
16 Dec 2024
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
Xiaoye Qu
Daize Dong
Xuyang Hu
Tong Zhu
Weigao Sun
Yu-Xi Cheng
MoE
89
10
0
24 Nov 2024
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
67
0
0
23 Nov 2024
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
X. Li
Yuhang Zhang
Shouda Liu
Bin Cui
MoE
28
1
0
13 Nov 2024
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Krishna Teja Chitty-Venkata
Siddhisanket Raskar
B. Kale
Farah Ferdaus
Aditya Tanikanti
Ken Raffenetti
Valerie Taylor
M. Emani
V. Vishwanath
41
7
0
31 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
30
4
0
24 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Zhiyang Dou
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
45
3
0
21 Oct 2024
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
29
3
0
21 Oct 2024
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
Kaipeng Zhang
Xunliang Cai
MoE
71
3
0
16 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
31
13
0
15 Oct 2024
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Xu Liu
Juncheng Liu
Gerald Woo
Taha Aksu
Yuxuan Liang
Roger Zimmermann
Chenghao Liu
Silvio Savarese
Caiming Xiong
Doyen Sahoo
AI4TS
33
11
0
14 Oct 2024
MoIN: Mixture of Introvert Experts to Upcycle an LLM
Ajinkya Tejankar
K. Navaneet
Ujjawal Panchal
Kossar Pourahmadi
Hamed Pirsiavash
MoE
29
0
0
13 Oct 2024
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Haiyue Ma
Jian Liu
Ronny Krashinsky
23
0
0
10 Oct 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
32
4
0
09 Oct 2024
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
Siqi Wang
Zhengyu Chen
Bei Li
Keqing He
Min Zhang
Jingang Wang
36
2
0
08 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
64
3
0
04 Oct 2024
SciDFM: A Large Language Model with Mixture-of-Experts for Science
Liangtai Sun
Danyu Luo
Da Ma
Zihan Zhao
Baocai Chen
Zhennan Shen
Su Zhu
Lu Chen
Xin Chen
Kai Yu
MoE
35
2
0
27 Sep 2024
1
2
3
4
Next