
v1v2 (latest)
Emergent Modularity in Pre-trained Transformers
Zhengyan Zhang
Yankai Lin
Chaojun Xiao
Xiaozhi Wang
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
Papers citing "Emergent Modularity in Pre-trained Transformers"
42 / 42 papers shown
Title |
---|
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |