Memory Augmented Language Models through Mixture of Word Experts

Memory Augmented Language Models through Mixture of Word Experts

Papers citing "Memory Augmented Language Models through Mixture of Word Experts"

24 / 24 papers shown
Title
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
78
124
0
05 Oct 2021

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.