Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.04838
Cited By
Block Pruning For Faster Transformers
10 September 2021
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Block Pruning For Faster Transformers"
50 / 153 papers shown
Title
MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model
Xin Yao
Ziqing Yang
Yiming Cui
Shijin Wang
31
3
0
03 Apr 2023
Greener yet Powerful: Taming Large Code Generation Models with Quantization
Xiaokai Wei
Sujan Kumar Gonugondla
W. Ahmad
Shiqi Wang
Baishakhi Ray
...
Ben Athiwaratkun
Mingyue Shang
M. K. Ramanathan
Parminder Bhatia
Bing Xiang
MQ
30
6
0
09 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
40
20
0
07 Mar 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Shiwei Liu
Tianlong Chen
Zhenyu Zhang
Xuxi Chen
Tianjin Huang
Ajay Jaiswal
Zhangyang Wang
32
29
0
03 Mar 2023
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers
Shikhar Tuli
N. Jha
36
32
0
28 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
102
0
27 Feb 2023
Elementwise Language Representation
Du-Yeong Kim
Jeeeun Kim
36
0
0
27 Feb 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik R. Narasimhan
MoE
26
5
0
24 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
Mahdi Nikdan
Tommaso Pegolotti
Eugenia Iofinova
Eldar Kurtic
Dan Alistarh
26
11
0
09 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
21
33
0
07 Feb 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
25
24
0
07 Feb 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
19
19
0
27 Jan 2023
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Ningxin Zheng
Huiqiang Jiang
Quan Zhang
Zhenhua Han
Yuqing Yang
...
Fan Yang
Chengruidong Zhang
Lili Qiu
Mao Yang
Lidong Zhou
50
27
0
26 Jan 2023
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Ziqing Yang
Yiming Cui
Xin Yao
Shijin Wang
VLM
37
8
0
15 Dec 2022
On the Effectiveness of Parameter-Efficient Fine-Tuning
Z. Fu
Haoran Yang
Anthony Man-Cho So
Wai Lam
Lidong Bing
Nigel Collier
27
156
0
28 Nov 2022
Structured Pruning Adapters
Lukas Hedegaard
Aman Alok
Juby Jose
Alexandros Iosifidis
38
10
0
17 Nov 2022
A Survey for Efficient Open Domain Question Answering
Qin Zhang
Shan Chen
Dongkuan Xu
Qingqing Cao
Xiaojun Chen
Trevor Cohn
Meng Fang
28
33
0
15 Nov 2022
Intriguing Properties of Compression on Multilingual Models
Kelechi Ogueji
Orevaoghene Ahia
Gbemileke Onilude
Sebastian Gehrmann
Sara Hooker
Julia Kreutzer
26
12
0
04 Nov 2022
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models
Stelios Maroudas
Sotiris Legkas
Prodromos Malakasiotis
Ilias Chalkidis
VLM
AILaw
ALM
ELM
29
4
0
24 Oct 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
36
37
0
14 Oct 2022
GMP*: Well-Tuned Gradual Magnitude Pruning Can Outperform Most BERT-Pruning Methods
Eldar Kurtic
Dan Alistarh
AI4MH
38
14
0
12 Oct 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance
Li Zhang
Youkow Homma
Yujing Wang
Min-man Wu
Mao Yang
Ruofei Zhang
Ting Cao
Wei Shen
OffRL
19
5
0
30 Aug 2022
SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Zihao Ye
Ruihang Lai
Junru Shao
Tianqi Chen
Luis Ceze
78
91
0
11 Jul 2022
Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning
Przemyslaw K. Joniak
Akiko Aizawa
16
27
0
06 Jul 2022
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
25
3
0
30 Jun 2022
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Qingru Zhang
Simiao Zuo
Chen Liang
Alexander Bukharin
Pengcheng He
Weizhu Chen
T. Zhao
27
78
0
25 Jun 2022
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Xiaoxia Wu
Z. Yao
Minjia Zhang
Conglong Li
Yuxiong He
MQ
19
31
0
04 Jun 2022
Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks
Lukas Hauzenberger
Shahed Masoudian
Deepak Kumar
Markus Schedl
Navid Rekabsaz
30
17
0
30 May 2022
MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang
Yang Yang
Qifan Wang
Jiahao Liu
Jingang Wang
Wei Wu
Dawei Song
47
4
0
29 May 2022
Spartan: Differentiable Sparsity via Regularized Transportation
Kai Sheng Tai
Taipeng Tian
Ser-Nam Lim
34
11
0
27 May 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
64
19
0
25 May 2022
Exploring Extreme Parameter Compression for Pre-trained Language Models
Yuxin Ren
Benyou Wang
Lifeng Shang
Xin Jiang
Qun Liu
33
18
0
20 May 2022
Towards Climate Awareness in NLP Research
Daniel Hershcovich
Nicolas Webersinke
Mathias Kraus
J. Bingler
Markus Leippold
35
32
0
10 May 2022
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
32
87
0
01 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
11
180
0
01 Apr 2022
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Ziqing Yang
Yiming Cui
Zhigang Chen
SyDa
VLM
31
12
0
30 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
29
144
0
29 Mar 2022
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Ali Hadi Zadeh
Mostafa Mahmoud
Ameer Abdelhadi
Andreas Moshovos
MQ
24
31
0
23 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLM
MQ
MedIm
22
120
0
14 Mar 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
36
22
0
28 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
23
58
0
15 Feb 2022
SPDY: Accurate Pruning with Speedup Guarantees
Elias Frantar
Dan Alistarh
41
33
0
31 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
47
286
0
14 Jan 2022
Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks
Kevin Lee Hunter
Lawrence Spracklen
Subutai Ahmad
23
20
0
27 Dec 2021
Pruning Pretrained Encoders with a Multitask Objective
Patrick Xia
Richard Shin
47
0
0
10 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao Song
Atri Rudra
Christopher Ré
33
75
0
30 Nov 2021
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
34
82
0
10 Nov 2021
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
39
99
0
25 Oct 2021
Previous
1
2
3
4
Next