Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 196 papers shown
Title
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
1
0
0
24 Jul 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
164
1
0
01 Jul 2025
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Jiashun Cheng
Aochuan Chen
Nuo Chen
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
29
0
0
20 Jun 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki
Xiuyu Li
Junxian Guo
Ligeng Zhu
Chenfeng Xu
Konstantinos N. Plataniotis
Amir Yazdanbakhsh
Kurt Keutzer
Song Han
Zhijian Liu
42
0
0
19 Jun 2025
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
Yan Sun
Qixin Zhang
Zhiyuan Yu
Xikun Zhang
Li Shen
Dacheng Tao
41
0
0
15 Jun 2025
Training-free LLM Merging for Multi-task Learning
Zichuan Fu
Xian Wu
Y. X. R. Wang
Wanyu Wang
Shanshan Ye
Hongzhi Yin
Yi-Ju Chang
Yefeng Zheng
Xiangyu Zhao
MoMe
31
0
0
14 Jun 2025
Compression Aware Certified Training
Changming Xu
Gagandeep Singh
30
0
0
13 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
112
0
0
11 Jun 2025
Fairness is Not Silence: Unmasking Vacuous Neutrality in Small Language Models
Sumanth Manduru
Carlotta Domeniconi
ALM
34
0
0
10 Jun 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
35
0
0
10 Jun 2025
SAFE: Finding Sparse and Flat Minima to Improve Pruning
Dongyeop Lee
Kwanhee Lee
Jinseok Chung
Namhoon Lee
52
0
0
07 Jun 2025
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Yuanzhe Hu
Kinshuk Goel
Vlad Killiakov
Yaoqing Yang
77
2
0
06 Jun 2025
BAQ: Efficient Bit Allocation Quantization for Large Language Models
Chao Zhang
Li Wang
S. Lasaulce
Mérouane Debbah
MQ
77
0
0
06 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
129
0
0
05 Jun 2025
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling
Anhao Zhao
Fanghua Ye
Yingqi Fan
Junlong Tong
Zhiwei Fei
Hui Su
Xiaoyu Shen
76
0
0
04 Jun 2025
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Seungcheol Park
Jeongin Bae
Beomseok Kwon
Minjun Kim
Byeongwook Kim
S. Kwon
U. Kang
Dongsoo Lee
MQ
158
0
0
04 Jun 2025
QA-HFL: Quality-Aware Hierarchical Federated Learning for Resource-Constrained Mobile Devices with Heterogeneous Image Quality
Sajid Hussain
Muhammad Sohail
Nauman Ali Khan
56
0
0
04 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
30
0
0
04 Jun 2025
Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information
Seungcheol Park
Sojin Lee
Jongjin Kim
Jinsik Lee
Hyunjik Jo
U. Kang
86
2
0
04 Jun 2025
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
Xinyi Wang
Lirong Gao
Haobo Wang
Yiming Zhang
Junbo Zhao
MoE
53
0
0
31 May 2025
Smooth Model Compression without Fine-Tuning
Christina Runkel
Natacha Kuete Meli
Jovita Lukasik
A. Biguri
Carola-Bibiane Schönlieb
Michael Moeller
61
0
0
30 May 2025
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
Tianteng Gu
Bei Liu
Bo Xiao
Ke Zeng
Jiacheng Liu
Y. Qian
66
0
0
29 May 2025
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
X. Meng
Mehdi Makni
Rahul Mazumder
44
0
0
29 May 2025
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Q. Xiao
Alan Ansell
Boqian Wu
Lu Yin
Mykola Pechenizkiy
Shiwei Liu
Decebal Constantin Mocanu
45
0
0
29 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
66
0
0
28 May 2025
SlimLLM: Accurate Structured Pruning for Large Language Models
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
64
0
0
28 May 2025
DLP: Dynamic Layerwise Pruning in Large Language Models
Yuli Chen
B. Cheng
Jiale Han
Yingying Zhang
Yingting Li
Shuhao Zhang
56
0
0
27 May 2025
M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
Rochelle Choenni
Ivan Titov
54
0
0
27 May 2025
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari
Shivanshu Gupta
Fei Wang
Anshuman Chhabra
Muhao Chen
TDI
72
0
0
27 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
117
0
0
27 May 2025
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong
Zhenheng Tang
Xiang Liu
Lujun Li
Xiaowen Chu
Bo Li
119
0
0
26 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
57
0
0
26 May 2025
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Viktoriia Chekalina
Daniil Moskovskiy
Daria Cherniuk
Maxim Kurkin
Andrey Kuznetsov
Evgeny Frolov
228
0
0
23 May 2025
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
134
0
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
59
0
0
23 May 2025
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Yue Li
Xin Yi
Dongsheng Shi
Gerard de Melo
Xiaoling Wang
Linlin Wang
75
0
0
22 May 2025
One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
Rongguang Ye
Ming Tang
58
0
0
18 May 2025
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Ning Lu
Shengcai Liu
Jiahao Wu
Weiyu Chen
Zhirui Zhang
Yew-Soon Ong
Qi Wang
Ke Tang
116
3
0
17 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
95
1
0
16 May 2025
Addition is almost all you need: Compressing neural networks with double binary factorization
Vladimír Boža
Vladimír Macko
MQ
161
0
0
16 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
Junxuan Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
178
0
0
09 May 2025
Onboard Optimization and Learning: A Survey
Monirul Islam Pavel
Siyi Hu
Mahardhika Pratama
Ryszard Kowalczyk
73
0
0
07 May 2025
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
510
0
0
05 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
458
3
0
03 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Ayan Sengupta
Tanmoy Chakraborty
126
0
0
02 May 2025
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
Baz Roland
Kristina Malyseva
Anna Pappa
Tristan Cazenave
124
0
0
29 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
303
0
0
23 Apr 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Liu
Inesh Chakrabarti
Yixiao Li
Mengdi Wang
Tuo Zhao
Lin F. Yang
MQ
97
0
0
20 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
69
1
0
19 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
154
0
0
18 Apr 2025
1
2
3
4
Next