ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.03382
  4. Cited By
Tutel: Adaptive Mixture-of-Experts at Scale

Tutel: Adaptive Mixture-of-Experts at Scale

7 June 2022
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
Han Hu
Zilong Wang
Rafael Salas
Jithin Jose
Prabhat Ram
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
    MoE
ArXivPDFHTML

Papers citing "Tutel: Adaptive Mixture-of-Experts at Scale"

23 / 73 papers shown
Title
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
94
46
0
18 Dec 2023
MoEC: Mixture of Experts Implicit Neural Compression
MoEC: Mixture of Experts Implicit Neural Compression
Jianchen Zhao
Cheng-Ching Tseng
Ming Lu
Ruichuan An
Xiaobao Wei
He Sun
Shanghang Zhang
18
3
0
03 Dec 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
  mixture-of-datasets
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
25
12
0
08 Nov 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and
  Scalable Large Mixture-of-Experts Models
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
37
12
0
29 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
29
24
0
25 Oct 2023
Direct Neural Machine Translation with Task-level Mixture of Experts
  models
Direct Neural Machine Translation with Task-level Mixture of Experts models
Isidora Chara Tourni
Subhajit Naskar
MoE
21
0
0
18 Oct 2023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable
  Mixture-of-Expert Inference
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Ranggi Hwang
Jianyu Wei
Shijie Cao
Changho Hwang
Xiaohu Tang
Ting Cao
Mao Yang
MoE
47
40
0
23 Aug 2023
Experts Weights Averaging: A New General Training Scheme for Vision
  Transformers
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Yongqian Huang
Peng Ye
Xiaoshui Huang
Sheng Li
Tao Chen
Tong He
Wanli Ouyang
MoMe
34
8
0
11 Aug 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
38
54
0
24 May 2023
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Chong Yu
Tao Chen
Zhongxue Gan
Jiayuan Fan
MQ
ViT
30
23
0
18 May 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via
  Dynamic Device Placement
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Bin Cui
MoE
39
44
0
08 Apr 2023
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
Haritz Puerto
Tim Baumgärtner
Rachneet Sachdeva
Haishuo Fang
Haotian Zhang
Sewin Tariverdian
Kexin Wang
Iryna Gurevych
28
2
0
31 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
24
63
0
13 Mar 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize
  Mixture-of-Experts Training
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
45
30
0
11 Mar 2023
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert
  (MoE) Inference
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang
Newsha Ardalani
Anna Y. Sun
Liu Ke
Hsien-Hsin S. Lee
Anjali Sridhar
Shruti Bhosale
Carole-Jean Wu
Benjamin C. Lee
MoE
70
23
0
10 Mar 2023
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation
  Invariant Transformation
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Ningxin Zheng
Huiqiang Jiang
Quan Zhang
Zhenhua Han
Yuqing Yang
...
Fan Yang
Chengruidong Zhang
Lili Qiu
Mao Yang
Lidong Zhou
42
27
0
26 Jan 2023
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual
  Machine Translation
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
54
8
0
15 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
19
103
0
29 Nov 2022
A Review of Sparse Expert Models in Deep Learning
A Review of Sparse Expert Models in Deep Learning
W. Fedus
J. Dean
Barret Zoph
MoE
20
144
0
04 Sep 2022
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed
  Training System
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System
Xiaonan Nie
Pinxue Zhao
Xupeng Miao
Tong Zhao
Bin Cui
MoE
21
36
0
28 Mar 2022
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion
  Parameter Pretraining
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
...
Jie Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
MoE
92
43
0
08 Oct 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
104
84
0
22 Sep 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Previous
12