ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.06118
  4. Cited By
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse
  Transformers

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

12 August 2022
Chao Fang
Aojun Zhou
Zhongfeng Wang
    MoE
ArXivPDFHTML

Papers citing "An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers"

8 / 8 papers shown
Title
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
82
76
0
07 May 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
  Mixture-of-Experts Large Language Models
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
32
25
0
22 Feb 2024
Spatial Re-parameterization for N:M Sparsity
Spatial Re-parameterization for N:M Sparsity
Yuxin Zhang
Mingbao Lin
Mingliang Xu
Yonghong Tian
Rongrong Ji
44
2
0
09 Jun 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
Bi-directional Masks for Efficient N:M Sparse Training
Bi-directional Masks for Efficient N:M Sparse Training
Yuxin Zhang
Yiting Luo
Mingbao Lin
Mingliang Xu
Jingjing Xie
Rongrong Ji
Rongrong Ji
49
15
0
13 Feb 2023
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
341
5,785
0
29 Apr 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference
  and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
141
684
0
31 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1