
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
Kuai Li
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
Papers citing "ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models"
38 / 38 papers shown
Title |
---|
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |