Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.05754
Cited By
Prune Once for All: Sparse Pre-Trained Language Models
10 November 2021
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Prune Once for All: Sparse Pre-Trained Language Models"
50 / 59 papers shown
Title
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
118
0
0
23 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
121
0
0
01 May 2025
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability
Ashhadul Islam
S. Belhaouari
Amine Bermak
95
0
0
24 Feb 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
129
25
0
08 Jan 2025
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Mingxuan Zhang
Y. Sun
F. Liang
120
0
0
01 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
144
0
0
01 Nov 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Zebin Yang
Renze Chen
Taiqiang Wu
Ngai Wong
Yun Liang
Runsheng Wang
R. Huang
Meng Li
MQ
87
1
0
23 Oct 2024
Application Specific Compression of Deep Learning Models
Rohit Raj Rai
Angana Borah
Amit Awekar
53
0
0
09 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
168
13
0
19 Aug 2024
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
124
0
0
10 Jun 2024
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
93
2
0
07 Jun 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
Kaoutar El Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
116
4
0
26 May 2024
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
R. Sukthanker
Arber Zela
B. Staffler
Aaron Klein
Lennart Purucker
Jorg K. H. Franke
Frank Hutter
ELM
103
4
0
16 May 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
73
21
0
12 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Massimiliano Mancini
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
81
2
0
08 Apr 2024
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks
Guanhua Ding
Zexi Ye
Zhen Zhong
Gang Li
David Shao
61
0
0
29 Mar 2024
SEVEN: Pruning Transformer Model by Reserving Sentinels
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
74
3
0
19 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
91
36
0
15 Mar 2024
Unveiling Linguistic Regions in Large Language Models
Zhihao Zhang
Jun Zhao
Qi Zhang
Tao Gui
Xuanjing Huang
104
13
0
22 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
116
58
0
15 Feb 2024
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Eldar Kurtic
Torsten Hoefler
Dan Alistarh
72
3
0
21 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
114
61
0
19 Dec 2023
Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Maolin Wang
Yao-Min Zhao
Jiajia Liu
Jingdong Chen
Chenyi Zhuang
Jinjie Gu
Ruocheng Guo
Xiangyu Zhao
51
6
0
10 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
152
26
0
03 Dec 2023
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency
Azhar Shaikh
Michael Cochez
Denis Diachkov
Michiel de Rijcke
Sahar Yousefi
79
0
0
09 Nov 2023
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
Hang Shao
Bei Liu
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
100
22
0
14 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
130
50
0
02 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
85
39
0
02 Oct 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
86
7
0
29 Sep 2023
Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping
Boyu Chen
Hanxuan Chen
Jiao He
Fengyu Sun
Shangling Jui
89
3
0
15 Aug 2023
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev
Eldar Kurtic
Eugenia Iofinova
Elias Frantar
Alexandra Peste
Dan Alistarh
VLM
108
13
0
03 Aug 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
129
75
0
16 Jul 2023
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
Phuoc-Hoan Charles Le
Xinlin Li
ViT
MQ
87
21
0
29 Jun 2023
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
...
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
66
4
0
28 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
77
34
0
06 Jun 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma
Gongfan Fang
Xinchao Wang
178
446
0
19 May 2023
PDP: Parameter-free Differentiable Pruning is All You Need
Minsik Cho
Saurabh N. Adya
Devang Naik
VLM
67
12
0
18 May 2023
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Daniel Fernando Campos
Chengxiang Zhai
128
2
0
05 Apr 2023
Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval
Daniel Fernando Campos
ChengXiang Zhai
57
0
0
31 Mar 2023
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Daniel Fernando Campos
Alexandre Marques
Mark Kurtz
Chengxiang Zhai
VLM
AAML
52
2
0
30 Mar 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
Denis Kuznedelev
Soroush Tabesh
Kimia Noorbakhsh
Elias Frantar
Sara Beery
Eldar Kurtic
Dan Alistarh
MQ
VLM
79
2
0
25 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
96
27
0
09 Mar 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Shiwei Liu
Tianlong Chen
Zhenyu Zhang
Xuxi Chen
Tianjin Huang
Ajay Jaiswal
Zhangyang Wang
87
28
0
03 Mar 2023
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
119
1
0
03 Mar 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
63
5
0
24 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
134
25
0
19 Feb 2023
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
Mahdi Nikdan
Tommaso Pegolotti
Eugenia Iofinova
Eldar Kurtic
Dan Alistarh
84
11
0
09 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
88
34
0
07 Feb 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
67
3
0
23 Jan 2023
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off
Shaoyi Huang
Bowen Lei
Dongkuan Xu
Hongwu Peng
Yue Sun
Mimi Xie
Caiwen Ding
109
19
0
30 Nov 2022
1
2
Next