First Activations Matter: Training-Free Methods for Dynamic Activation
  in Large Language Models

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Papers citing "First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models"

10 / 10 papers shown
Title
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
91
128
0
05 Oct 2021

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.