Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 196 papers shown
Title
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
AI4CE
263
3
0
21 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
174
0
0
11 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
175
5
0
05 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
167
7
0
04 Nov 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
145
6
0
29 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
137
0
0
28 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
98
2
0
23 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
503
0
0
22 Oct 2024
EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
Oliver Sieberling
Denis Kuznedelev
Eldar Kurtic
Dan Alistarh
MQ
83
5
0
18 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Sihang Li
Yongbin Li
183
10
0
17 Oct 2024
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
Yingsong Luo
Ling Chen
MQ
111
0
0
16 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
137
3
0
16 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
80
1
0
16 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
288
22
0
14 Oct 2024
Chip-Tuning: Classify Before Language Models Say
Fangwei Zhu
Dian Li
Jiajun Huang
Gang Liu
Hui Wang
Zhifang Sui
76
0
0
09 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi-An Ma
137
2
0
09 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
134
4
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
220
21
0
06 Oct 2024
Aggressive Post-Training Compression on Extremely Large Language Models
Zining Zhang
Yao Chen
Bingsheng He
Zhenjie Zhang
46
0
0
30 Sep 2024
Demystifying Issues, Causes and Solutions in LLM Open-Source Projects
Yangxiao Cai
Peng Liang
Yifei Wang
Zengyang Li
Mojtaba Shahin
110
2
0
25 Sep 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
Vardan Papyan
VLM
176
3
0
20 Sep 2024
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Jaeseong Lee
Seung-won Hwang
Aurick Qiao
Daniel F Campos
Z. Yao
Yuxiong He
70
3
0
10 Sep 2024
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
Yao Shu
Wenyang Hu
Szu Hui Ng
Bryan Kian Hsiang Low
Fei Richard Yu
FedML
105
2
0
10 Sep 2024
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
107
6
0
07 Sep 2024
Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches
Yanjie Dong
Xiaoyi Fan
Fangxin Wang
Chengming Li
Victor C. M. Leung
Xiping Hu
56
5
0
20 Aug 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
175
13
0
19 Aug 2024
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Xianzhen Luo
Yixuan Wang
Qingfu Zhu
Zhiming Zhang
Xuanyu Zhang
Qing Yang
Dongliang Xu
128
9
0
16 Aug 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
189
24
0
30 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
Volkan Cevher
Yida Wang
George Karypis
129
5
0
12 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
202
56
0
09 Jul 2024
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
214
4
0
09 Jul 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Zhichao Xu
Ashim Gupta
Tao Li
Oliver Bentham
Vivek Srikumar
118
13
0
06 Jul 2024
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Aaron Mueller
CML
94
10
0
05 Jul 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
Li Du
Guoqi Li
Jiajun Zhang
174
6
0
05 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Zongzhang Zhang
Di He
KELM
111
1
0
03 Jul 2024
Learning Neural Networks with Sparse Activations
Pranjal Awasthi
Nishanth Dikkala
Pritish Kamath
Raghu Meka
153
4
0
26 Jun 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
136
2
0
24 Jun 2024
BlockPruner: Fine-grained Pruning for Large Language Models
Longguang Zhong
Fanqi Wan
Ruijun Chen
Xiaojun Quan
Liangzhi Li
113
10
0
15 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
108
10
0
31 May 2024
Occam Gradient Descent
B. N. Kausik
ODL
VLM
87
0
0
30 May 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
RALM
323
207
0
27 May 2024
Large Language Model Pruning
Hanjuan Huang
Hao-Jia Song
H. Pao
132
0
0
24 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
144
20
0
23 May 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
120
18
0
22 Apr 2024
SparseDM: Toward Sparse Efficient Diffusion Models
Kafeng Wang
Jianfei Chen
He Li
Zhenpeng Mi
Jun-Jie Zhu
DiffM
197
10
0
16 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
146
56
0
15 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
Longwei Zou
Qingyang Wang
Han Zhao
Tingfeng Liu
Yi Yang
Yangdong Deng
117
0
0
10 Apr 2024
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
Hongchuan Zeng
Hongshen Xu
Lu Chen
Kai Yu
106
7
0
06 Apr 2024
LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models
Mingxing Peng
Xusen Guo
Xianda Chen
Meixin Zhu
Kehua Chen
Hao
Hao Yang
Xuesong Wang
Yinhai Wang
LRM
112
21
0
27 Mar 2024
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
95
165
0
21 Mar 2024
Previous
1
2
3
4
Next