ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
v1v2v3v4 (latest)

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
Kuai Li
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun

Papers citing "ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models"

38 / 38 papers shown
Title
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
83
125
0
05 Oct 2021

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.