ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.01786
  4. Cited By
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

5 October 2021
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
    MoE
ArXivPDFHTML

Papers citing "MoEfication: Transformer Feed-forward Layers are Mixtures of Experts"

6 / 56 papers shown
Title
Gaussian Error Linear Units (GELUs)
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
163
4,958
0
27 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
184
8,067
0
16 Jun 2016
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
288
19,523
0
09 Mar 2015
Deep Learning of Representations: Looking Forward
Deep Learning of Representations: Looking Forward
Yoshua Bengio
163
679
0
02 May 2013
Maxout Networks
Maxout Networks
Ian Goodfellow
David Warde-Farley
M. Berk Mirza
Aaron Courville
Yoshua Bengio
OOD
204
2,176
0
18 Feb 2013
Improving neural networks by preventing co-adaptation of feature
  detectors
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
408
7,650
0
03 Jul 2012
Previous
12