MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou

Papers citing "MoEfication: Transformer Feed-forward Layers are Mixtures of Experts"

50 / 56 papers shown
Title
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
178
223
0
31 Dec 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
179
1,678
0
08 Jun 2020