Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 288 papers shown
Title
DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
226
1
0
28 Dec 2024
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
Binrui Zeng
Shezheng Song
Xiaodong Liu
Jie Yu
Huijun Liu
Jun Ma
Xiaopeng Li
Shasha Li
Xinran Hong
Yongtao Tang
MQ
178
1
0
24 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Lean Fu
Xing Mei
MQ
227
2
0
23 Dec 2024
DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction
Symposium on Operating Systems Principles (SOSP), 2024
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
415
9
0
04 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
279
6
0
02 Dec 2024
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar
T. V. Rozendaal
Romain Lepert
Todor Boinovski
M. V. Baalen
Markus Nagel
Paul N. Whatmough
B. Bejnordi
MoE
226
7
0
27 Nov 2024
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
207
9
0
23 Nov 2024
Layer Pruning with Consensus: A Triple-Win Solution
Leandro Giusti Mugnaini
Carolina Tavares Duarte
Anna Helena Reali Costa
Artur Jordao
166
1
0
21 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
324
1
0
21 Nov 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
AI4CE
336
3
0
21 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
272
1
0
11 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
239
10
0
05 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
311
11
0
04 Nov 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
219
9
0
29 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
186
2
0
28 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
146
5
0
23 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
665
0
0
22 Oct 2024
EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
Oliver Sieberling
Denis Kuznedelev
Eldar Kurtic
Dan Alistarh
MQ
172
6
0
18 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Cunchun Li
Yongbin Li
272
24
0
17 Oct 2024
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
Yingsong Luo
Ling Chen
MQ
159
0
0
16 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
133
1
0
16 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
259
3
0
16 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
365
24
0
14 Oct 2024
Chip-Tuning: Classify Before Language Models Say
Fangwei Zhu
Dian Li
Jiajun Huang
Gang Liu
Hui Wang
Zhifang Sui
97
0
0
09 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi-An Ma
225
2
0
09 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
168
13
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
321
30
0
06 Oct 2024
Aggressive Post-Training Compression on Extremely Large Language Models
Zining Zhang
Yao Chen
Bingsheng He
Zhenjie Zhang
54
0
0
30 Sep 2024
Pruning Multilingual Large Language Models for Multilingual Inference
Hwichan Kim
Jun Suzuki
Tosho Hirasawa
Mamoru Komachi
163
0
0
25 Sep 2024
Demystifying Issues, Causes and Solutions in LLM Open-Source Projects
Yangxiao Cai
Peng Liang
Yifei Wang
Zengyang Li
Mojtaba Shahin
180
3
0
25 Sep 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
Vardan Papyan
VLM
314
13
0
20 Sep 2024
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Yuezhou Hu
Jun-Jie Zhu
Jianfei Chen
195
3
0
13 Sep 2024
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Jaeseong Lee
Seung-won Hwang
Aurick Qiao
Daniel F Campos
Z. Yao
Yuxiong He
108
7
0
10 Sep 2024
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
Yao Shu
Wenyang Hu
Szu Hui Ng
Bryan Kian Hsiang Low
Fei Richard Yu
FedML
226
2
0
10 Sep 2024
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
156
12
0
07 Sep 2024
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Maxim Zhelnin
Viktor Moskvoretskii
Egor Shvetsov
Egor Venediktov
Mariya Krylova
Aleksandr Zuev
Evgeny Burnaev
143
6
0
27 Aug 2024
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
180
25
0
22 Aug 2024
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Elias Frantar
Roberto L. Castro
Jiale Chen
Torsten Hoefler
Dan Alistarh
MQ
130
22
0
21 Aug 2024
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
Chi Ma
Mincong Huang
Ying Zhang
Chao Wang
Yujie Wang
Lei Yu
Chuan Liu
Wei Lin
AI4CE
LLMSV
122
3
0
21 Aug 2024
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su
Ziyi Guan
Xiaoqun Liu
Tianlai Jin
Dongkuan Wu
Zhengfei Chen
G. Chesi
Ngai Wong
Hao Yu
94
2
0
20 Aug 2024
Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches
Yanjie Dong
Xiaoyi Fan
Fangxin Wang
Chengming Li
Victor C. M. Leung
Xiping Hu
111
8
0
20 Aug 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
294
21
0
19 Aug 2024
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Tiansheng Huang
Gautam Bhattacharya
Pratik Joshi
Josh Kimball
Ling Liu
AAML
MoMe
210
40
0
18 Aug 2024
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Xianzhen Luo
Yixuan Wang
Qingfu Zhu
Zhiming Zhang
Xuanyu Zhang
Qing Yang
Dongliang Xu
221
15
0
16 Aug 2024
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
113
3
0
06 Aug 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
256
34
0
30 Jul 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
151
8
0
26 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
Volkan Cevher
Yida Wang
George Karypis
173
9
0
12 Jul 2024
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
297
4
0
09 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
287
101
0
09 Jul 2024
Previous
1
2
3
4
5
6
Next