Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
46 / 196 papers shown
Title
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
185
64
0
12 Mar 2024
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Ruikang Liu
Haoli Bai
Haokun Lin
Yuening Li
Han Gao
Zheng-Jun Xu
Lu Hou
Jun Yao
Chun Yuan
MQ
84
32
0
02 Mar 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
101
3
0
28 Feb 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
220
0
0
27 Feb 2024
Data-free Weight Compress and Denoise for Large Language Models
Runyu Peng
Yunhua Zhou
Qipeng Guo
Yang Gao
Hang Yan
Xipeng Qiu
Dahua Lin
164
1
0
26 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
110
53
0
19 Feb 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
116
3
0
18 Feb 2024
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park
Jake Hyun
SangLyul Cho
Bonggeun Sim
Jae W. Lee
MQ
129
20
0
16 Feb 2024
Squat: Quant Small Language Models on the Edge
Xuan Shen
Zhenglun Kong
Zhenglun Kong
Zhaoyang Han
Changdi Yang
...
Lei Lu
Cheng Lyu
Zhihao Shu
Wei Niu
Miriam Leeser
MQ
138
22
0
16 Feb 2024
Towards Meta-Pruning via Optimal Transport
Alexander Theus
Olin Geimer
Friedrich Wicke
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
MoMe
91
4
0
12 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
109
7
0
08 Feb 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
112
36
0
08 Feb 2024
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Bowen Zhao
Hannaneh Hajishirzi
Qingqing Cao
151
21
0
22 Jan 2024
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
114
18
0
28 Dec 2023
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer
Megi Andoni
Christoph Spiegel
Sebastian Pokutta
VLM
195
10
0
23 Dec 2023
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
142
87
0
23 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
121
61
0
19 Dec 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
138
61
0
10 Dec 2023
Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Maolin Wang
Yao-Min Zhao
Jiajia Liu
Jingdong Chen
Chenyi Zhuang
Jinjie Gu
Ruocheng Guo
Xiangyu Zhao
73
6
0
10 Dec 2023
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Kaiyan Zhang
Ning Ding
Biqing Qi
Xuekai Zhu
Xinwei Long
Bowen Zhou
100
5
0
24 Oct 2023
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko
Seungjoon Park
Yujin Kim
Sumyeong Ahn
Du-Seong Chang
Euijai Ahn
SeYoung Yun
115
6
0
16 Oct 2023
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yuxin Zhang
Lirui Zhao
Mingbao Lin
Yunyun Sun
Yiwu Yao
Xingjia Han
Jared Tanner
Shiwei Liu
Rongrong Ji
SyDa
105
44
0
13 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
149
103
0
08 Oct 2023
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Anastasios Kyrillidis
Robert Sim
218
6
0
04 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
132
50
0
02 Oct 2023
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
Duc Hoang
Minsik Cho
Thomas Merth
Mohammad Rastegari
Zhangyang Wang
KELM
CLL
103
5
0
02 Oct 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
99
7
0
29 Sep 2023
Pruning Large Language Models via Accuracy Predictor
Yupeng Ji
Yibo Cao
Jiu-si Liu
KELM
89
4
0
18 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo Zhang
Xiangxiang Chu
MQ
121
34
0
06 Sep 2023
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li
Yifan Zhang
Liang Li
Peng Yao
Bo Zhang
Xiangxiang Chu
Yerui Sun
Li-Qiang Du
Yuchen Xie
MQ
110
14
0
30 Aug 2023
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
143
56
0
06 Jul 2023
Query Understanding in the Age of Large Language Models
Avishek Anand
Venktesh V
Abhijit Anand
Vinay Setty
LRM
111
5
0
28 Jun 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
241
316
0
24 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
98
34
0
06 Jun 2023
Intriguing Properties of Quantization at Scale
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
131
38
0
30 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
125
57
0
25 May 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
Vipin Chaudhary
Shuai Xu
Helen Zhou
106
15
0
24 May 2023
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Zhaozhuo Xu
Zirui Liu
Beidi Chen
Yuxin Tang
Jue Wang
Kaixiong Zhou
Helen Zhou
Anshumali Shrivastava
MQ
117
32
0
17 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
90
319
0
28 Apr 2023
Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models
D. Honegger
Konstantin Schurholt
Damian Borth
98
4
0
26 Apr 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
225
405
0
13 Mar 2023
Streaming Kernel PCA Algorithm With Small Space
Yichuan Deng
Zhao Song
Zifan Wang
Hangke Zhang
116
4
0
08 Mar 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
228
16
0
17 Feb 2023
A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and Techniques
Wenbin Li
Hakim Hacid
Ebtesam Almazrouei
Merouane Debbah
107
13
0
16 Feb 2023
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
194
76
0
26 Oct 2022
An Information Theory-inspired Strategy for Automatic Network Pruning
Xiawu Zheng
Yuexiao Ma
Teng Xi
Gang Zhang
Errui Ding
Yuchao Li
Jie Chen
Yonghong Tian
Rongrong Ji
228
13
0
19 Aug 2021
Previous
1
2
3
4