Emergent Modularity in Pre-trained Transformers
v1v2 (latest)

Emergent Modularity in Pre-trained Transformers

Zhengyan Zhang
Yankai Lin
Chaojun Xiao
Xiaozhi Wang
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
    MoE

Papers citing "Emergent Modularity in Pre-trained Transformers"

42 / 42 papers shown
Title
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
91
128
0
05 Oct 2021

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.