Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.18926
Cited By
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
27 February 2024
Yuanhang Yang
Shiyi Qi
Wenchao Gu
Chaozheng Wang
Cuiyun Gao
Zenglin Xu
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection"
9 / 9 papers shown
Title
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
29
0
0
12 May 2025
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks
Wenjing Xiao
Wenhao Song
Miaojiang Chen
Ruikun Luo
Min Chen
MoE
203
0
0
29 Apr 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
82
0
0
04 Mar 2025
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
Peizhuang Cong
Wenpu Liu
Wenhan Yu
Haochen Zhao
Tong Yang
ALM
MoE
81
0
0
06 Feb 2025
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
29
3
0
21 Oct 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
66
7
0
23 May 2024
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
329
0
18 Feb 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1