Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.06118
Cited By
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers
12 August 2022
Chao Fang
Aojun Zhou
Zhongfeng Wang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers"
8 / 8 papers shown
Title
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
82
76
0
07 May 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
32
25
0
22 Feb 2024
Spatial Re-parameterization for N:M Sparsity
Yuxin Zhang
Mingbao Lin
Mingliang Xu
Yonghong Tian
Rongrong Ji
44
2
0
09 Jun 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
Bi-directional Masks for Efficient N:M Sparse Training
Yuxin Zhang
Yiting Luo
Mingbao Lin
Mingliang Xu
Jingjing Xie
Rongrong Ji
Rongrong Ji
49
15
0
13 Feb 2023
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
341
5,785
0
29 Apr 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
141
684
0
31 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1