ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05596
  4. Cited By
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

14 January 2022
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
ArXivPDFHTML

Papers citing "DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale"

50 / 189 papers shown
Title
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
Maryam Akhavan Aghdam
Hongpeng Jin
Yanzhao Wu
MoE
28
3
0
10 Sep 2024
Duplex: A Device for Large Language Models with Mixture of Experts,
  Grouped Query Attention, and Continuous Batching
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Sungmin Yun
Kwanhee Kyung
Juhwan Cho
Jaewan Choi
Jongmin Kim
Byeongho Kim
Sukhan Lee
Kyomin Sohn
Jung Ho Ahn
MoE
46
5
0
02 Sep 2024
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep
  Learning
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Wei An
Xiao Bi
Guanting Chen
Shanhuang Chen
Chengqi Deng
...
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Yuheng Zou
41
6
0
26 Aug 2024
FedMoE: Personalized Federated Learning via Heterogeneous Mixture of
  Experts
FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts
Hanzi Mei
Dongqi Cai
Ao Zhou
Shangguang Wang
Mengwei Xu
MoE
27
4
0
21 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
81
2
0
13 Aug 2024
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face
  Manipulation Detection and Localization
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization
Changtao Miao
Qi Chu
Tao Gong
Zhentao Tan
Zhenchao Jin
Wanyi Zhuang
Man Luo
Honggang Hu
Nenghai Yu
CVBM
54
1
0
05 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
Recent Advances in Generative AI and Large Language Models: Current
  Status, Challenges, and Perspectives
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
D. Hagos
Rick Battle
Danda B. Rawat
LM&MA
OffRL
31
22
0
20 Jul 2024
Mixture of Experts with Mixture of Precisions for Tuning Quality of
  Service
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service
HamidReza Imani
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
64
6
0
19 Jul 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
65
16
0
16 Jul 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
60
792
0
15 Jul 2024
MaskMoE: Boosting Token-Level Learning via Routing Mask in
  Mixture-of-Experts
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts
Zhenpeng Su
Zijia Lin
Xue Bai
Xing Wu
Yizhe Xiong
...
Guangyuan Ma
Hui Chen
Guiguang Ding
Wei Zhou
Songlin Hu
MoE
34
5
0
13 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
45
3
0
12 Jul 2024
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models
  with Adaptive Expert Placement
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Yongji Wu
Wenjie Qu
Tianyang Tao
Zhuang Wang
Wei Bai
Zhuohao Li
Yuan Tian
Jiaheng Zhang
Matthew Lentz
Danyang Zhuo
66
3
0
05 Jul 2024
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for
  Robot Learning
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
Yixiao Wang
Yifei Zhang
Mingxiao Huo
Ran Tian
Xiang Zhang
...
Chenfeng Xu
Pengliang Ji
Wei Zhan
Mingyu Ding
Masayoshi Tomizuka
MoE
44
18
0
01 Jul 2024
Parm: Efficient Training of Large Sparsely-Activated Models with
  Dedicated Schedules
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
Xinglin Pan
Wenxiang Lin
S. Shi
Xiaowen Chu
Weinong Sun
Bo Li
MoE
52
3
0
30 Jun 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoE
29
0
0
28 Jun 2024
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual
  Pre-training
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Tong Zhu
Xiaoye Qu
Daize Dong
Jiacheng Ruan
Jingqi Tong
Conghui He
Yu Cheng
MoE
ALM
54
71
0
24 Jun 2024
FastPersist: Accelerating Model Checkpointing in Deep Learning
FastPersist: Accelerating Model Checkpointing in Deep Learning
Guanhua Wang
Olatunji Ruwase
Bing Xie
Yuxiong He
27
7
0
19 Jun 2024
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu
Zihan Qiu
Zili Wang
Hang Zhao
Jie Fu
MoE
51
3
0
18 Jun 2024
Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion
  Prediction
Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction
Wenzhao Jiang
Jindong Han
Hao Liu
Tao Tao
Naiqiang Tan
Hui Xiong
MoE
39
8
0
14 Jun 2024
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
Pingzhi Li
Xiaolong Jin
Yu Cheng
Tianlong Chen
Tianlong Chen
MQ
MoE
42
1
0
12 Jun 2024
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated
  Parameters
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Yixin Song
Haotong Xie
Zhengyan Zhang
Bo Wen
Li Ma
Zeyu Mi
Haibo Chen
MoE
38
21
0
10 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
43
5
0
07 Jun 2024
Zamba: A Compact 7B SSM Hybrid Model
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
30
45
0
26 May 2024
Bridging The Gap between Low-rank and Orthogonal Adaptation via
  Householder Reflection Adaptation
Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
Shen Yuan
Haotian Liu
Hongteng Xu
44
2
0
24 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
66
7
0
23 May 2024
A Foundation Model for Brain Lesion Segmentation with Mixture of
  Modality Experts
A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
Xinru Zhang
N. Ou
Berke Doga Basaran
Marco Visentin
Mengyun Qiao
...
Ouyang Cheng
Yaou Liu
Paul M. Matthew
Chuyang Ye
Wenjia Bai
MedIm
37
4
0
16 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
40
2
0
15 May 2024
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph
  Computation-Communication Overlapping
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang
Ye Tian
Zhen Jia
Shuai Zheng
Chuan Wu
Yida Wang
MoMe
29
7
0
30 Apr 2024
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based
  Mixture of Experts
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Dengchun Li
Yingzi Ma
Naizheng Wang
Zhengmao Ye
Zhiyuan Cheng
...
Yan Zhang
Lei Duan
Jie Zuo
Cal Yang
Mingjie Tang
MoE
40
44
0
22 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
41
15
0
12 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
53
7
0
07 Apr 2024
Cost-Efficient Large Language Model Serving for Multi-turn Conversations
  with CachedAttention
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
Bin Gao
Zhuomin He
Puru Sharma
Qingxuan Kang
Djordje Jevdjic
Junbo Deng
Xingkun Yang
Zhou Yu
Pengfei Zuo
68
45
0
23 Mar 2024
Conditional computation in neural networks: principles and research
  trends
Conditional computation in neural networks: principles and research trends
Simone Scardapane
Alessandro Baiocchi
Alessio Devoto
V. Marsocci
Pasquale Minervini
Jary Pomponi
34
1
0
12 Mar 2024
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
Shanghaoran Quan
MoE
OffRL
52
7
0
02 Mar 2024
Disaggregated Multi-Tower: Topology-aware Modeling Technique for
  Efficient Large-Scale Recommendation
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Liang Luo
Buyun Zhang
Michael Tsang
Yinbin Ma
Ching-Hsiang Chu
...
Guna Lakshminarayanan
Ellie Wen
Jongsoo Park
Dheevatsa Mudigere
Maxim Naumov
43
4
0
01 Mar 2024
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
Yuanhang Yang
Shiyi Qi
Wenchao Gu
Chaozheng Wang
Cuiyun Gao
Zenglin Xu
MoE
19
8
0
27 Feb 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
45
80
0
26 Feb 2024
HyperMoE: Towards Better Mixture of Experts via Transferring Among
  Experts
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
Hao Zhao
Zihan Qiu
Huijia Wu
Zili Wang
Zhaofeng He
Jie Fu
MoE
30
9
0
20 Feb 2024
Unraveling Complex Data Diversity in Underwater Acoustic Target
  Recognition through Convolution-based Mixture of Experts
Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts
Yuan Xie
Jiawei Ren
Ji Xu
38
12
0
19 Feb 2024
Turn Waste into Worth: Rectifying Top-$k$ Router of MoE
Turn Waste into Worth: Rectifying Top-kkk Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoE
MoMe
33
4
0
17 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
Differentially Private Training of Mixture of Experts Models
Differentially Private Training of Mixture of Experts Models
Pierre Tholoniat
Huseyin A. Inan
Janardhan Kulkarni
Robert Sim
MoE
41
1
0
11 Feb 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
71
20
0
10 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
22
31
0
06 Feb 2024
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing
  Activation Density in Transformers
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
Bharat Runwal
Tejaswini Pedapati
Pin-Yu Chen
MoE
47
4
0
02 Feb 2024
BlackMamba: Mixture of Experts for State-Space Models
BlackMamba: Mixture of Experts for State-Space Models
Quentin G. Anthony
Yury Tokpanov
Paolo Glorioso
Beren Millidge
30
21
0
01 Feb 2024
T3: Transparent Tracking & Triggering for Fine-grained Overlap of
  Compute & Collectives
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
20
13
0
30 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
Previous
1234
Next