Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
v1v2v3v4 (latest)

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

    MoE

Papers citing "Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models"

50 / 56 papers shown
Title
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
87
128
0
05 Oct 2021

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.