Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.04838
Cited By
Block Pruning For Faster Transformers
10 September 2021
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Block Pruning For Faster Transformers"
50 / 153 papers shown
Title
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan
Raghav Magazine
Shubham Kushwaha
M. Debbah
Deepak Gupta
23
18
0
02 Feb 2024
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
32
12
0
27 Jan 2024
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Bowen Zhao
Hannaneh Hajishirzi
Qingqing Cao
29
17
0
22 Jan 2024
A Survey on Efficient Federated Learning Methods for Foundation Model Training
Herbert Woisetschläger
Alexander Isenko
Shiqiang Wang
R. Mayer
Hans-Arno Jacobsen
FedML
65
24
0
09 Jan 2024
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention
Zhen Tan
Tianlong Chen
Zhenyu Zhang
Huan Liu
52
14
0
22 Dec 2023
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Eldar Kurtic
Torsten Hoefler
Dan Alistarh
42
3
0
21 Dec 2023
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
70
12
0
20 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
36
26
0
03 Dec 2023
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking
Chia-Hsuan Lee
Hao Cheng
Mari Ostendorf
47
4
0
16 Nov 2023
Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data
Bart Pleiter
Behrad Tajalli
Stefanos Koffas
Gorka Abad
Jing Xu
Martha Larson
S. Picek
LMTD
AAML
48
1
0
13 Nov 2023
SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks
Mohammadreza Salehi
Sachin Mehta
Aditya Kusupati
Ali Farhadi
Hannaneh Hajishirzi
40
5
0
18 Oct 2023
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko
Seungjoon Park
Yujin Kim
Sumyeong Ahn
Du-Seong Chang
Euijai Ahn
SeYoung Yun
16
4
0
16 Oct 2023
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
Hang Shao
Bei Liu
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
44
17
0
14 Oct 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
47
7
0
13 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
39
23
0
11 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
40
270
0
10 Oct 2023
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo
Jiahang Xu
Li Zhang
Mao Yang
27
14
0
08 Oct 2023
Can pruning make Large Language Models more efficient?
Sia Gholami
Marwan Omar
28
12
0
06 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
27
15
0
03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
44
46
0
02 Oct 2023
A Comprehensive Review of Generative AI in Healthcare
Yasin Shokrollahi
Sahar Yarmohammadtoosky
Matthew M. Nikahd
Pengfei Dong
Xianqi Li
Linxia Gu
MedIm
AI4CE
27
19
0
01 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
34
1
0
01 Oct 2023
Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
32
7
0
29 Sep 2023
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling
Bokyeong Yoon
Yoonsang Han
Gordon Euhyun Moon
27
0
0
22 Sep 2023
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
26
7
0
04 Sep 2023
S
P
3
\rm SP^3
S
P
3
: Enhancing Structured Pruning via PCA Projection
Yuxuan Hu
Jing Zhang
Zhe Zhao
Chengliang Zhao
Xiaodong Chen
Cuiping Li
Hong Chen
35
1
0
31 Aug 2023
Sparse Binary Transformers for Multivariate Time Series Modeling
Matt Gorbett
Hossein Shirazi
I. Ray
AI4TS
32
13
0
09 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
42
5
0
07 Aug 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
45
63
0
16 Jul 2023
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
23
25
0
30 Jun 2023
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
...
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
38
4
0
28 Jun 2023
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Junyan Li
Li Zhang
Jiahang Xu
Yujing Wang
Shaoguang Yan
...
Ting Cao
Hao Sun
Weiwei Deng
Qi Zhang
Mao Yang
41
10
0
26 Jun 2023
Low-Rank Prune-And-Factorize for Language Model Compression
Siyu Ren
Kenny Q. Zhu
14
9
0
25 Jun 2023
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
26
0
0
23 Jun 2023
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
44
69
0
20 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
34
33
0
06 Jun 2023
Binary and Ternary Natural Language Generation
Zechun Liu
Barlas Oğuz
Aasish Pappu
Yangyang Shi
Raghuraman Krishnamoorthi
MQ
41
6
0
02 Jun 2023
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Huiqiang Jiang
Li Zhang
Yuang Li
Yu-Huan Wu
Shijie Cao
Ting Cao
Yuqing Yang
Jinyu Li
Mao Yang
Lili Qiu
CVBM
65
9
0
31 May 2023
LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers
Xuanqing Liu
Zhuotao Liu
19
22
0
28 May 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLM
VLM
15
21
0
27 May 2023
PruMUX: Augmenting Data Multiplexing with Model Compression
Yushan Su
Vishvak Murahari
Karthik R. Narasimhan
Keqin Li
25
3
0
24 May 2023
Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model
Wenxin Tan
22
0
0
21 May 2023
PDP: Parameter-free Differentiable Pruning is All You Need
Minsik Cho
Saurabh N. Adya
Devang Naik
VLM
17
11
0
18 May 2023
Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Hang Shao
Wei Wang
Bei Liu
Xun Gong
Haoyu Wang
Y. Qian
99
10
0
18 May 2023
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Taiqiang Wu
Cheng-An Hou
Shanshan Lao
Jiayi Li
Ngai Wong
Zhe Zhao
Yujiu Yang
71
10
0
16 May 2023
Compressing audio CNNs with graph centrality based filter pruning
James A. King
Ashutosh Kumar Singh
Mark D. Plumbley
GNN
17
2
0
05 May 2023
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
E. Georganas
Dhiraj D. Kalamkar
K. Voronin
Abhisek Kundu
Antonio Noack
Hans Pabst
Alexander Breuer
A. Heinecke
16
2
0
25 Apr 2023
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Daniel Fernando Campos
Chengxiang Zhai
24
2
0
05 Apr 2023
Efficient CNNs via Passive Filter Pruning
Arshdeep Singh
Mark D. Plumbley
24
1
0
05 Apr 2023
Physics-aware Roughness Optimization for Diffractive Optical Neural Networks
Shangli Zhou
Yingjie Li
Minhan Lou
Weilu Gao
Zhijie Shi
Cunxi Yu
Caiwen Ding
33
2
0
04 Apr 2023
Previous
1
2
3
4
Next