Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.07137
Cited By
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
10 March 2025
Siyuan Mu
Sen Lin
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications"
50 / 203 papers shown
Title
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
164
118
0
07 Jun 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Basil Mustafa
C. Riquelme
J. Puigcerver
Rodolphe Jenatton
N. Houlsby
VLM
MoE
152
196
0
06 Jun 2022
Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble
Zhengyu Yang
Kan Ren
Xufang Luo
Minghuan Liu
Weiqing Liu
Jiang Bian
Weinan Zhang
Dongsheng Li
47
21
0
19 May 2022
One-shot Federated Learning without Server-side Training
Shangchao Su
Bin Li
Xiangyang Xue
FedML
35
28
0
26 Apr 2022
Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability
Svetlana Pavlitska
Christian Hubschneider
Lukas Struppek
J. Marius Zöllner
MoE
53
12
0
22 Apr 2022
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
75
36
0
20 Apr 2022
On the Representation Collapse of Sparse Mixture of Experts
Zewen Chi
Li Dong
Shaohan Huang
Damai Dai
Shuming Ma
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
MoMe
MoE
60
104
0
20 Apr 2022
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners
Shashank Gupta
Subhabrata Mukherjee
K. Subudhi
Eduardo Gonzalez
Damien Jose
Ahmed Hassan Awadallah
Jianfeng Gao
MoE
51
49
0
16 Apr 2022
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
102
250
0
07 Apr 2022
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
247
30,108
0
01 Mar 2022
Continual Learning Beyond a Single Model
T. Doan
Seyed Iman Mirzadeh
Mehrdad Farajtabar
CLL
52
16
0
20 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
277
355
0
18 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
183
191
0
17 Feb 2022
TRGP: Trust Region Gradient Projection for Continual Learning
Sen Lin
Li Yang
Deliang Fan
Junshan Zhang
CLL
109
77
0
07 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
92
298
0
14 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
376
15,454
0
20 Dec 2021
Learning to Prompt for Continual Learning
Zifeng Wang
Zizhao Zhang
Chen-Yu Lee
Han Zhang
Ruoxi Sun
Xiaoqi Ren
Guolong Su
Vincent Perot
Jennifer Dy
Tomas Pfister
CLL
VPVLM
KELM
VLM
86
773
0
16 Dec 2021
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALM
MoE
205
812
0
13 Dec 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
123
791
0
29 Nov 2021
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Sam Bond-Taylor
P. Hessey
Hiroshi Sasaki
T. Breckon
Chris G. Willcocks
DiffM
74
72
0
24 Nov 2021
Federated Social Recommendation with Graph Neural Network
Zhiwei Liu
Liangwei Yang
Ziwei Fan
Hao Peng
Philip S. Yu
FedML
65
154
0
21 Nov 2021
Mixture-of-Variational-Experts for Continual Learning
Y. Yin
Yu Wang
CLL
FedML
41
6
0
25 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
243
109
0
24 Sep 2021
Personalised Federated Learning: A Combinational Approach
Sone Kyaw Pye
Han Yu
FedML
26
5
0
22 Aug 2021
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Patrick Esser
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
74
160
0
19 Aug 2021
Federated Mixture of Experts
M. Reisser
Christos Louizos
E. Gavves
Max Welling
FedML
64
24
0
14 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
373
10,273
0
17 Jun 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
96
600
0
10 Jun 2021
DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
Hussein Hazimeh
Zhe Zhao
Aakanksha Chowdhery
M. Sathiamoorthy
Yihua Chen
Rahul Mazumder
Lichan Hong
Ed H. Chi
MoE
138
144
0
07 Jun 2021
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
Yongxing Dai
Xiaotong Li
Jun Liu
Zekun Tong
Ling-yu Duan
OOD
50
128
0
19 May 2021
RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling
Yizhe Zhang
Siqi Sun
Xiang Gao
Yuwei Fang
Chris Brockett
Michel Galley
Jianfeng Gao
Bill Dolan
RALM
84
33
0
14 May 2021
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning
Jie Ren
Yewen Li
Zihan Ding
Wei Pan
Hao Dong
BDL
MoE
44
26
0
19 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
828
29,341
0
26 Feb 2021
Simple multi-dataset detection
Xingyi Zhou
V. Koltun
Philipp Krahenbuhl
ObjD
270
117
0
25 Feb 2021
Multi-Task Reinforcement Learning with Context-based Representations
Shagun Sodhani
Amy Zhang
Joelle Pineau
63
190
0
11 Feb 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
83
2,178
0
11 Jan 2021
PFL-MoE: Personalized Federated Learning Based on Mixture of Experts
Binbin Guo
Yuan Mei
Danyang Xiao
Weigang Wu
Ye Yin
Hongli Chang
MoE
83
22
0
31 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
557
40,961
0
22 Oct 2020
A Brief Review of Domain Adaptation
Abolfazl Farahani
Sahar Voghoei
Khaled Rasheed
H. Arabnia
OOD
47
539
0
07 Oct 2020
Specialized federated learning using a mixture of experts
Edvin Listo Zec
Olof Mogren
John Martinsson
L. R. Sütfeld
D. Gillblad
FedML
50
29
0
05 Oct 2020
The Computational Limits of Deep Learning
Neil C. Thompson
Kristjan Greenewald
Keeheon Lee
Gabriel F. Manso
VLM
46
525
0
10 Jul 2020
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Kimin Lee
Michael Laskin
A. Srinivas
Pieter Abbeel
OffRL
48
203
0
09 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
86
1,162
0
30 Jun 2020
Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts
R. Akrour
Davide Tateo
Jan Peters
32
22
0
10 Jun 2020
An Efficient Framework for Clustered Federated Learning
Avishek Ghosh
Jichan Chung
Dong Yin
Kannan Ramchandran
FedML
65
857
0
07 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
708
41,894
0
28 May 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
544
2,022
0
04 May 2020
Federated Multi-view Matrix Factorization for Personalized Recommendations
Adrian Flanagan
Were Oyomno
A. Grigorievskiy
K. E. Tan
Suleiman A. Khan
Muhammad Ammad-ud-din
FedML
43
71
0
08 Apr 2020
Multi-Task Reinforcement Learning with Soft Modularization
Ruihan Yang
Huazhe Xu
Yi Wu
Xiaolong Wang
52
181
0
30 Mar 2020
Deep Reinforcement Learning for Autonomous Driving: A Survey
B. R. Kiran
Ibrahim Sobh
V. Talpaert
Patrick Mannion
A. A. Sallab
S. Yogamani
P. Pérez
327
1,681
0
02 Feb 2020
Previous
1
2
3
4
5
Next