ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.05754
  4. Cited By
Prune Once for All: Sparse Pre-Trained Language Models

Prune Once for All: Sparse Pre-Trained Language Models

10 November 2021
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
    VLM
ArXiv (abs)PDFHTML

Papers citing "Prune Once for All: Sparse Pre-Trained Language Models"

50 / 59 papers shown
Title
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
118
0
0
23 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
121
0
0
01 May 2025
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability
Ashhadul Islam
S. Belhaouari
Amine Bermak
93
0
0
24 Feb 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedMLMoMe
129
25
0
08 Jan 2025
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture
  Gaussian Prior
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Mingxuan Zhang
Y. Sun
F. Liang
120
0
0
01 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
144
0
0
01 Nov 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Zebin Yang
Renze Chen
Taiqiang Wu
Ngai Wong
Yun Liang
Runsheng Wang
R. Huang
Meng Li
MQ
87
1
0
23 Oct 2024
Application Specific Compression of Deep Learning Models
Application Specific Compression of Deep Learning Models
Rohit Raj Rai
Angana Borah
Amit Awekar
53
0
0
09 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
168
13
0
19 Aug 2024
Evaluating Zero-Shot Long-Context LLM Compression
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
124
0
0
10 Jun 2024
VTrans: Accelerating Transformer Compression with Variational
  Information Bottleneck based Pruning
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
93
2
0
07 Jun 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
Kaoutar El Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
116
4
0
26 May 2024
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
R. Sukthanker
Arber Zela
B. Staffler
Aaron Klein
Lennart Purucker
Jorg K. H. Franke
Frank Hutter
ELM
103
4
0
16 May 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
73
21
0
12 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Massimiliano Mancini
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
81
2
0
08 Apr 2024
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output
  Channel Pruning on Computer Vision Tasks
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks
Guanhua Ding
Zexi Ye
Zhen Zhong
Gang Li
David Shao
61
0
0
29 Mar 2024
SEVEN: Pruning Transformer Model by Reserving Sentinels
SEVEN: Pruning Transformer Model by Reserving Sentinels
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
74
3
0
19 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
91
36
0
15 Mar 2024
Unveiling Linguistic Regions in Large Language Models
Unveiling Linguistic Regions in Large Language Models
Zhihao Zhang
Jun Zhao
Qi Zhang
Tao Gui
Xuanjing Huang
104
13
0
22 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
116
58
0
15 Feb 2024
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
  May Cry'' Benchmark
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Eldar Kurtic
Torsten Hoefler
Dan Alistarh
72
3
0
21 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
114
61
0
19 Dec 2023
Large Multimodal Model Compression via Efficient Pruning and
  Distillation at AntGroup
Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Maolin Wang
Yao-Min Zhao
Jiajia Liu
Jingdong Chen
Chenyi Zhuang
Jinjie Gu
Ruocheng Guo
Xiangyu Zhao
51
6
0
10 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model
  Perspective
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
152
26
0
03 Dec 2023
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing
  Learning Efficiency
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency
Azhar Shaikh
Michael Cochez
Denis Diachkov
Michiel de Rijcke
Sahar Yousefi
79
0
0
09 Nov 2023
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language
  Models
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
Hang Shao
Bei Liu
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
100
22
0
14 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
130
50
0
02 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
  Routing Policy
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
85
39
0
02 Oct 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
86
7
0
29 Sep 2023
Ternary Singular Value Decomposition as a Better Parameterized Form in
  Linear Mapping
Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping
Boyu Chen
Hanxuan Chen
Jiao He
Fengyu Sun
Shangling Jui
89
3
0
15 Aug 2023
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev
Eldar Kurtic
Eugenia Iofinova
Elias Frantar
Alexandra Peste
Dan Alistarh
VLM
108
13
0
03 Aug 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
129
75
0
16 Jul 2023
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional
  Models
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
Phuoc-Hoan Charles Le
Xinlin Li
ViTMQ
87
21
0
29 Jun 2023
An Efficient Sparse Inference Software Accelerator for Transformer-based
  Language Models on CPUs
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
...
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
66
4
0
28 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The
  Weights that Matter
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
77
34
0
06 Jun 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma
Gongfan Fang
Xinchao Wang
175
446
0
19 May 2023
PDP: Parameter-free Differentiable Pruning is All You Need
PDP: Parameter-free Differentiable Pruning is All You Need
Minsik Cho
Saurabh N. Adya
Devang Naik
VLM
67
12
0
18 May 2023
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence
  Models for Improved Inference Efficiency
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Daniel Fernando Campos
Chengxiang Zhai
128
2
0
05 Apr 2023
Dense Sparse Retrieval: Using Sparse Language Models for Inference
  Efficient Dense Retrieval
Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval
Daniel Fernando Campos
ChengXiang Zhai
57
0
0
31 Mar 2023
oBERTa: Improving Sparse Transfer Learning via improved initialization,
  distillation, and pruning regimes
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Daniel Fernando Campos
Alexandre Marques
Mark Kurtz
Chengxiang Zhai
VLMAAML
50
2
0
30 Mar 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware
  Compression
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
Denis Kuznedelev
Soroush Tabesh
Kimia Noorbakhsh
Elias Frantar
Sara Beery
Eldar Kurtic
Dan Alistarh
MQVLM
79
2
0
25 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
96
27
0
09 Mar 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Shiwei Liu
Tianlong Chen
Zhenyu Zhang
Xuxi Chen
Tianjin Huang
Ajay Jaiswal
Zhangyang Wang
87
28
0
03 Mar 2023
Rotation Invariant Quantization for Model Compression
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
119
1
0
03 Mar 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
63
5
0
24 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
134
25
0
19 Feb 2023
SparseProp: Efficient Sparse Backpropagation for Faster Training of
  Neural Networks
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
Mahdi Nikdan
Tommaso Pegolotti
Eugenia Iofinova
Eldar Kurtic
Dan Alistarh
84
11
0
09 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
88
34
0
07 Feb 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
67
3
0
23 Jan 2023
Dynamic Sparse Training via Balancing the Exploration-Exploitation
  Trade-off
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off
Shaoyi Huang
Bowen Lei
Dongkuan Xu
Hongwu Peng
Yue Sun
Mimi Xie
Caiwen Ding
109
19
0
30 Nov 2022
12
Next