ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.03382
  4. Cited By
Tutel: Adaptive Mixture-of-Experts at Scale

Tutel: Adaptive Mixture-of-Experts at Scale

7 June 2022
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
Han Hu
Zilong Wang
Rafael Salas
Jithin Jose
Prabhat Ram
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
    MoE
ArXivPDFHTML

Papers citing "Tutel: Adaptive Mixture-of-Experts at Scale"

50 / 73 papers shown
Title
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields
Zhenxing Mi
Ping Yin
Xue Xiao
Dan Xu
MoE
49
0
0
04 May 2025
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Zhilin Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
128
0
0
02 May 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
65
0
0
21 Apr 2025
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
Zheng Li
Yong-Jin Liu
Wei Zhang
Tailing Yuan
Bin Chen
Chengru Song
Di Zhang
34
0
0
20 Apr 2025
MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications
MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications
Aashaka Shah
Abhinav Jangda
Yangqiu Song
Caio Rocha
Changho Hwang
...
Peng Cheng
Qinghua Zhou
Roshan Dathathri
Saeed Maleki
Ziyue Yang
GNN
49
0
0
11 Apr 2025
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Yongji Wu
Xueshen Liu
Shuowei Jin
Ceyu Xu
Feng Qian
Ziming Mao
Matthew Lentz
Danyang Zhuo
Ion Stoica
MoMe
MoE
59
0
0
04 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
45
0
0
02 Apr 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
48
0
0
13 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
158
0
0
10 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
135
1
0
10 Mar 2025
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
Seokjin Go
Divya Mahajan
MoE
67
0
0
10 Feb 2025
Importance Sampling via Score-based Generative Models
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
136
0
0
07 Feb 2025
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Zihan Qiu
Zeyu Huang
Jian Xu
Kaiyue Wen
Zekun Wang
Rui Men
Ivan Titov
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
57
6
0
21 Jan 2025
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
Ziyue Luo
Jia-Wei Liu
Myungjin Lee
Ness B. Shroff
41
0
0
09 Jan 2025
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization
Kourosh Darvish
Marta Skreta
Yuchi Zhao
Naruki Yoshikawa
Sagnik Som
...
Han Hao
Haoping Xu
Alán Aspuru-Guzik
Animesh Garg
Florian Shkurti
59
21
0
08 Jan 2025
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object
  Re-Identification
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang
Y. Liu
Aihua Zheng
Pingping Zhang
85
5
0
14 Dec 2024
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action
  Recognition under Occlusions
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
Guanyu Zhou
Xiaohan Yu
Wenxin Huang
Xuemei Jia
Xian Zhong
Chia-Wen Lin
CML
81
0
0
24 Nov 2024
Communication-Efficient Sparsely-Activated Model Training via Sequence
  Migration and Token Condensation
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
67
0
0
23 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yunhong Wang
Gangshan Wu
Tong He
Limin Wang
100
2
0
21 Nov 2024
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive
  Hashing
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
X. Li
Yuhang Zhang
Shouda Liu
Bin Cui
MoE
28
1
0
13 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO
  Computation Redundancy
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
31
2
0
02 Nov 2024
MoNTA: Accelerating Mixture-of-Experts Training with
  Network-Traffc-Aware Parallel Optimization
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
J. Guo
Yan Liu
Yu Meng
Zhiwei Tao
Banglan Liu
Gang Chen
Xiang Li
MoE
27
0
0
01 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in
  Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
69
1
0
01 Nov 2024
Stealing User Prompts from Mixture of Experts
Stealing User Prompts from Mixture of Experts
Itay Yona
Ilia Shumailov
Jamie Hayes
Nicholas Carlini
MoE
29
3
0
30 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Zhiyang Dou
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
45
3
0
21 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
64
3
0
04 Oct 2024
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware
  Diffusion and Iterative Refinement
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Xu He
Xiaoyu Li
Di Kang
Jiangnan Ye
Chaopeng Zhang
Liyang Chen
Xiangjun Gao
Han Zhang
Zhiyong Wu
Haolin Zhuang
DiffM
34
7
0
26 Aug 2024
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep
  Learning
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Wei An
Xiao Bi
Guanting Chen
Shanhuang Chen
Chengqi Deng
...
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Yuheng Zou
39
6
0
26 Aug 2024
Understanding the Performance and Estimating the Cost of LLM Fine-Tuning
Understanding the Performance and Estimating the Cost of LLM Fine-Tuning
Yuchen Xia
Jiho Kim
Yuhan Chen
Haojie Ye
Souvik Kundu
Cong
Hao
Nishil Talati
MoE
35
20
0
08 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models
  with Adaptive Expert Placement
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Yongji Wu
Wenjie Qu
Tianyang Tao
Zhuang Wang
Wei Bai
Zhuohao Li
Yuan Tian
Jiaheng Zhang
Matthew Lentz
Danyang Zhuo
61
3
0
05 Jul 2024
Investigating the potential of Sparse Mixtures-of-Experts for
  multi-domain neural machine translation
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation
Nadezhda Chirkova
Vassilina Nikoulina
Jean-Luc Meunier
Alexandre Berard
MoE
36
0
0
01 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
54
5
0
01 Jul 2024
Parm: Efficient Training of Large Sparsely-Activated Models with
  Dedicated Schedules
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
Xinglin Pan
Wenxiang Lin
S. Shi
Xiaowen Chu
Weinong Sun
Bo Li
MoE
44
3
0
30 Jun 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language
  Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
35
30
0
23 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
64
7
0
23 May 2024
USP: A Unified Sequence Parallelism Approach for Long Context Generative
  AI
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang
Shangchun Zhao
32
15
0
13 May 2024
Towards a Flexible and High-Fidelity Approach to Distributed DNN
  Training Emulation
Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation
Banruo Liu
M. Ojewale
Yuhan Ding
Marco Canini
28
1
0
05 May 2024
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph
  Computation-Communication Overlapping
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang
Ye Tian
Zhen Jia
Shuai Zheng
Chuan Wu
Yida Wang
MoMe
24
7
0
30 Apr 2024
Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing
Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing
Leonardo Rossi
Vittorio Bernuzzi
Tomaso Fontanini
Massimo Bertozzi
Andrea Prati
50
3
0
29 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
53
7
0
07 Apr 2024
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han
Shuai Zhang
Xingjian Shi
Markus Reichstein
31
22
0
01 Apr 2024
Tiny Models are the Computational Saver for Large Models
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
41
2
0
26 Mar 2024
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
Ka Man Lo
Yiming Liang
Wenyu Du
Yuantao Fan
Zili Wang
Wenhao Huang
Lei Ma
Jie Fu
MoE
39
2
0
26 Feb 2024
Routers in Vision Mixture of Experts: An Empirical Study
Routers in Vision Mixture of Experts: An Empirical Study
Tianlin Liu
Mathieu Blondel
C. Riquelme
J. Puigcerver
MoE
43
3
0
29 Jan 2024
Exploiting Inter-Layer Expert Affinity for Accelerating
  Mixture-of-Experts Model Inference
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao
Quentin G. Anthony
Hari Subramoni
Hari Subramoni
Dhabaleswar K.
Panda
MoE
31
13
0
16 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
65
76
0
23 Dec 2023
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALM
ELM
27
134
0
23 Dec 2023
12
Next