
Memory Augmented Language Models through Mixture of Word Experts
Papers citing "Memory Augmented Language Models through Mixture of Word Experts"
24 / 24 papers shown
Title |
---|
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |