
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
Papers citing "MoEfication: Transformer Feed-forward Layers are Mixtures of Experts"
50 / 56 papers shown
Title |
---|
![]() Pre-Trained Models: Past, Present and Future Xu Han Zhengyan Zhang Ning Ding Yuxian Gu Xiao Liu ...Jie Tang Ji-Rong Wen Jinhui Yuan Wayne Xin Zhao Jun Zhu |