ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.07137
  4. Cited By
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

10 March 2025
Siyuan Mu
Sen Lin
    MoE
ArXivPDFHTML

Papers citing "A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications"

50 / 203 papers shown
Title
Out-of-Domain Generalization in Dynamical Systems Reconstruction
Out-of-Domain Generalization in Dynamical Systems Reconstruction
Niclas Alexander Göring
Florian Hess
Manuel Brenner
Zahra Monfared
Daniel Durstewitz
AI4CE
63
16
0
28 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
99
36
0
13 Feb 2024
On Least Square Estimation in Softmax Gating Mixture of Experts
On Least Square Estimation in Softmax Gating Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
73
15
0
05 Feb 2024
Continual Learning with Pre-Trained Models: A Survey
Continual Learning with Pre-Trained Models: A Survey
Da-Wei Zhou
Hai-Long Sun
Jingyi Ning
Han-Jia Ye
De-Chuan Zhan
CLL
KELM
75
74
0
29 Jan 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
  in Instruction Finetuning MLLMs
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
116
55
0
29 Jan 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
62
98
0
29 Jan 2024
Divide and not forget: Ensemble of selectively trained experts in
  Continual Learning
Divide and not forget: Ensemble of selectively trained experts in Continual Learning
Grzegorz Rype'sć
Sebastian Cygert
Valeriya Khan
Tomasz Trzciñski
Bartosz Zieliñski
Bartlomiej Twardowski
CLL
49
31
0
18 Jan 2024
Sharing Knowledge in Multi-Task Deep Reinforcement Learning
Sharing Knowledge in Multi-Task Deep Reinforcement Learning
Carlo DÉramo
Davide Tateo
Andrea Bonarini
Marcello Restelli
Jan Peters
164
130
0
17 Jan 2024
Exploiting Inter-Layer Expert Affinity for Accelerating
  Mixture-of-Experts Model Inference
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao
Quentin G. Anthony
Hari Subramoni
Hari Subramoni
Dhabaleswar K.
Panda
MoE
51
14
0
16 Jan 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in
  Mixture-of-Experts Language Models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
82
289
0
11 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
144
1,081
0
08 Jan 2024
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
48
15
0
13 Dec 2023
Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts
Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts
Ahmed Hendawy
Jan Peters
Carlo DÉramo
MoE
52
18
0
19 Nov 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
  mixture-of-datasets
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
48
14
0
08 Nov 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
48
26
0
25 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
66
12
0
22 Oct 2023
Reinforcement Learning-based Mixture of Vision Transformers for Video
  Violence Recognition
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition
Hamid Reza Mohammadi
Ehsan Nazerfard
Tahereh Firoozi
ViT
57
2
0
04 Oct 2023
MoCaE: Mixture of Calibrated Experts Significantly Improves Object
  Detection
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz
Selim Kuzucu
Tom Joy
P. Dokania
MoE
92
7
0
26 Sep 2023
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of
  Experts
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts
Huy Nguyen
Pedram Akbarian
Fanqi Yan
Nhat Ho
MoE
77
17
0
25 Sep 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
154
120
0
02 Aug 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with
  Memorial Mixture-of-Experts
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Hanrong Ye
Dan Xu
MoE
64
27
0
28 Jul 2023
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat
Mirian Hipolito Garcia
Fangshuo Liao
Ahmed Hassan Awadallah
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Robert Sim
Dimitrios Dimitriadis
Anastasios Kyrillidis
FedML
MoE
36
6
0
14 Jun 2023
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient
  for Convolutional Neural Networks
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Mohammed Nowaz Rabbani Chowdhury
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoE
66
18
0
07 Jun 2023
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
  with Task-level Sparsity via Mixture-of-Experts
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
Rishov Sarkar
Hanxue Liang
Zhiwen Fan
Zhangyang Wang
Cong Hao
MoE
81
19
0
30 May 2023
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Zeyue Xue
Guanglu Song
Qiushan Guo
Boxiao Liu
Zhuofan Zong
Yu Liu
Ping Luo
DiffM
96
135
0
29 May 2023
Lifelong Language Pretraining with Distribution-Specialized Experts
Lifelong Language Pretraining with Distribution-Specialized Experts
Wuyang Chen
Yan-Quan Zhou
Nan Du
Yanping Huang
James Laudon
Zhiwen Chen
Claire Cu
KELM
68
49
0
20 May 2023
Towards Convergence Rates for Parameter Estimation in Gaussian-gated
  Mixture of Experts
Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts
Huy Nguyen
TrungTin Nguyen
Khai Nguyen
Nhat Ho
MoE
73
13
0
12 May 2023
Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Huy Nguyen
TrungTin Nguyen
Nhat Ho
53
23
0
05 May 2023
Revisiting Single-gated Mixtures of Experts
Revisiting Single-gated Mixtures of Experts
Amelie Royer
I. Karmanov
Andrii Skliar
B. Bejnordi
Tijmen Blankevoort
MoE
MoMe
56
6
0
11 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
306
7,274
0
05 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.3K
14,289
0
15 Mar 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize
  Mixture-of-Experts Training
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
50
32
0
11 Mar 2023
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert
  (MoE) Inference
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang
Newsha Ardalani
Anna Y. Sun
Liu Ke
Hsien-Hsin S. Lee
Anjali Sridhar
Shruti Bhosale
Carole-Jean Wu
Benjamin C. Lee
MoE
80
24
0
10 Mar 2023
Improved Training of Mixture-of-Experts Language GANs
Improved Training of Mixture-of-Experts Language GANs
Yekun Chai
Qiyue Yin
Junge Zhang
GAN
33
5
0
23 Feb 2023
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Dhawal Gupta
Yinlam Chow
Aza Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
OffRL
35
3
0
21 Feb 2023
PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
  Coordination
PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination
Xingzhou Lou
Jiaxian Guo
Junge Zhang
Jun Wang
Kaiqi Huang
Yali Du
37
29
0
16 Jan 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
77
2,298
0
19 Dec 2022
Beyond Not-Forgetting: Continual Learning with Backward Knowledge
  Transfer
Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
Sen Lin
Li Yang
Deliang Fan
Junshan Zhang
CLL
147
46
0
01 Nov 2022
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
  Learning with Model-Accelerator Co-design
M3^33ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
62
86
0
26 Oct 2022
PaCo: Parameter-Compositional Multi-Task Reinforcement Learning
PaCo: Parameter-Compositional Multi-Task Reinforcement Learning
Lingfeng Sun
Haichao Zhang
Wei Xu
Masayoshi Tomizuka
MoE
65
40
0
21 Oct 2022
Inferring Versatile Behavior from Demonstrations by Matching Geometric
  Descriptors
Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors
Niklas Freymuth
Nicolas Schreiber
P. Becker
Aleksander Taranovic
Gerhard Neumann
30
7
0
17 Oct 2022
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from
  Mixture-of-Experts
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts
Tao Zhong
Zhixiang Chi
Li Gu
Yang Wang
Yuanhao Yu
Jingshan Tang
OOD
116
32
0
08 Oct 2022
Mixture of experts models for multilevel data: modelling framework and
  approximation theory
Mixture of experts models for multilevel data: modelling framework and approximation theory
Tsz Chai Fung
Spark C. Tseung
40
3
0
30 Sep 2022
YOLOV: Making Still Image Object Detectors Great at Video Object
  Detection
YOLOV: Making Still Image Object Detectors Great at Video Object Detection
Yuheng Shi
Naiyan Wang
Xiaojie Guo
ObjD
3DH
52
51
0
20 Aug 2022
Towards Understanding Mixture of Experts in Deep Learning
Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLT
MoE
66
54
0
04 Aug 2022
Adaptive Mixture of Experts Learning for Generalizable Face
  Anti-Spoofing
Adaptive Mixture of Experts Learning for Generalizable Face Anti-Spoofing
Qianyu Zhou
Ke-Yue Zhang
Taiping Yao
Ran Yi
Shouhong Ding
Lizhuang Ma
OOD
CVBM
41
49
0
20 Jul 2022
CoSCL: Cooperation of Small Continual Learners is Stronger than a Big
  One
CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One
Liyuan Wang
Xingxing Zhang
Qian Li
Jun Zhu
Yi Zhong
CLL
52
48
0
13 Jul 2022
No Language Left Behind: Scaling Human-Centered Machine Translation
No Language Left Behind: Scaling Human-Centered Machine Translation
Nllb team
Marta R. Costa-jussá
James Cross
Onur cCelebi
Maha Elbayad
...
Alexandre Mourachko
C. Ropers
Safiyyah Saleem
Holger Schwenk
Jeff Wang
MoE
213
1,258
0
11 Jul 2022
Adaptive Expert Models for Personalization in Federated Learning
Adaptive Expert Models for Personalization in Federated Learning
Martin Isaksson
Edvin Listo Zec
R. Coster
D. Gillblad
vSarunas Girdzijauskas
FedML
29
5
0
15 Jun 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
  MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
56
69
0
09 Jun 2022
Previous
12345
Next