
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
Papers citing "First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models"
10 / 10 papers shown
Title |
---|
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |