Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.01786
Cited By
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
5 October 2021
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MoEfication: Transformer Feed-forward Layers are Mixtures of Experts"
6 / 56 papers shown
Title
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
163
4,958
0
27 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
184
8,067
0
16 Jun 2016
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
288
19,523
0
09 Mar 2015
Deep Learning of Representations: Looking Forward
Yoshua Bengio
163
679
0
02 May 2013
Maxout Networks
Ian Goodfellow
David Warde-Farley
M. Berk Mirza
Aaron Courville
Yoshua Bengio
OOD
204
2,176
0
18 Feb 2013
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
408
7,650
0
03 Jul 2012
Previous
1
2