ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00774
  4. Cited By
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

2 January 2023
Elias Frantar
Dan Alistarh
    VLM
ArXiv (abs)PDFHTMLGithub (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 196 papers shown
Title
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
1
0
0
24 Jul 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
164
1
0
01 Jul 2025
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Jiashun Cheng
Aochuan Chen
Nuo Chen
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
29
0
0
20 Jun 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki
Xiuyu Li
Junxian Guo
Ligeng Zhu
Chenfeng Xu
Konstantinos N. Plataniotis
Amir Yazdanbakhsh
Kurt Keutzer
Song Han
Zhijian Liu
42
0
0
19 Jun 2025
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
Yan Sun
Qixin Zhang
Zhiyuan Yu
Xikun Zhang
Li Shen
Dacheng Tao
41
0
0
15 Jun 2025
Training-free LLM Merging for Multi-task Learning
Training-free LLM Merging for Multi-task Learning
Zichuan Fu
Xian Wu
Y. X. R. Wang
Wanyu Wang
Shanshan Ye
Hongzhi Yin
Yi-Ju Chang
Yefeng Zheng
Xiangyu Zhao
MoMe
31
0
0
14 Jun 2025
Compression Aware Certified Training
Compression Aware Certified Training
Changming Xu
Gagandeep Singh
30
0
0
13 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
112
0
0
11 Jun 2025
Fairness is Not Silence: Unmasking Vacuous Neutrality in Small Language Models
Sumanth Manduru
Carlotta Domeniconi
ALM
34
0
0
10 Jun 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
35
0
0
10 Jun 2025
SAFE: Finding Sparse and Flat Minima to Improve Pruning
SAFE: Finding Sparse and Flat Minima to Improve Pruning
Dongyeop Lee
Kwanhee Lee
Jinseok Chung
Namhoon Lee
52
0
0
07 Jun 2025
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Yuanzhe Hu
Kinshuk Goel
Vlad Killiakov
Yaoqing Yang
77
2
0
06 Jun 2025
BAQ: Efficient Bit Allocation Quantization for Large Language Models
BAQ: Efficient Bit Allocation Quantization for Large Language Models
Chao Zhang
Li Wang
S. Lasaulce
Mérouane Debbah
MQ
77
0
0
06 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
129
0
0
05 Jun 2025
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling
Anhao Zhao
Fanghua Ye
Yingqi Fan
Junlong Tong
Zhiwei Fei
Hui Su
Xiaoyu Shen
76
0
0
04 Jun 2025
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Seungcheol Park
Jeongin Bae
Beomseok Kwon
Minjun Kim
Byeongwook Kim
S. Kwon
U. Kang
Dongsoo Lee
MQ
158
0
0
04 Jun 2025
QA-HFL: Quality-Aware Hierarchical Federated Learning for Resource-Constrained Mobile Devices with Heterogeneous Image Quality
QA-HFL: Quality-Aware Hierarchical Federated Learning for Resource-Constrained Mobile Devices with Heterogeneous Image Quality
Sajid Hussain
Muhammad Sohail
Nauman Ali Khan
56
0
0
04 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
MANBench: Is Your Multimodal Model Smarter than Human?
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
30
0
0
04 Jun 2025
Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information
Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information
Seungcheol Park
Sojin Lee
Jongjin Kim
Jinsik Lee
Hyunjik Jo
U. Kang
86
2
0
04 Jun 2025
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
Xinyi Wang
Lirong Gao
Haobo Wang
Yiming Zhang
Junbo Zhao
MoE
53
0
0
31 May 2025
Smooth Model Compression without Fine-Tuning
Smooth Model Compression without Fine-Tuning
Christina Runkel
Natacha Kuete Meli
Jovita Lukasik
A. Biguri
Carola-Bibiane Schönlieb
Michael Moeller
61
0
0
30 May 2025
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
Tianteng Gu
Bei Liu
Bo Xiao
Ke Zeng
Jiacheng Liu
Y. Qian
66
0
0
29 May 2025
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
X. Meng
Mehdi Makni
Rahul Mazumder
44
0
0
29 May 2025
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Q. Xiao
Alan Ansell
Boqian Wu
Lu Yin
Mykola Pechenizkiy
Shiwei Liu
Decebal Constantin Mocanu
45
0
0
29 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
66
0
0
28 May 2025
SlimLLM: Accurate Structured Pruning for Large Language Models
SlimLLM: Accurate Structured Pruning for Large Language Models
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
64
0
0
28 May 2025
DLP: Dynamic Layerwise Pruning in Large Language Models
DLP: Dynamic Layerwise Pruning in Large Language Models
Yuli Chen
B. Cheng
Jiale Han
Yingying Zhang
Yingting Li
Shuhao Zhang
56
0
0
27 May 2025
M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
Rochelle Choenni
Ivan Titov
54
0
0
27 May 2025
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari
Shivanshu Gupta
Fei Wang
Anshuman Chhabra
Muhao Chen
TDI
72
0
0
27 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
117
0
0
27 May 2025
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong
Zhenheng Tang
Xiang Liu
Lujun Li
Xiaowen Chu
Bo Li
119
0
0
26 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
57
0
0
26 May 2025
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Viktoriia Chekalina
Daniil Moskovskiy
Daria Cherniuk
Maxim Kurkin
Andrey Kuznetsov
Evgeny Frolov
228
0
0
23 May 2025
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
134
0
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
59
0
0
23 May 2025
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Yue Li
Xin Yi
Dongsheng Shi
Gerard de Melo
Xiaoling Wang
Linlin Wang
75
0
0
22 May 2025
One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
Rongguang Ye
Ming Tang
58
0
0
18 May 2025
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Ning Lu
Shengcai Liu
Jiahao Wu
Weiyu Chen
Zhirui Zhang
Yew-Soon Ong
Qi Wang
Ke Tang
116
3
0
17 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
95
1
0
16 May 2025
Addition is almost all you need: Compressing neural networks with double binary factorization
Addition is almost all you need: Compressing neural networks with double binary factorization
Vladimír Boža
Vladimír Macko
MQ
161
0
0
16 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
Junxuan Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
178
0
0
09 May 2025
Onboard Optimization and Learning: A Survey
Onboard Optimization and Learning: A Survey
Monirul Islam Pavel
Siyi Hu
Mahardhika Pratama
Ryszard Kowalczyk
73
0
0
07 May 2025
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
510
0
0
05 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
458
3
0
03 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Ayan Sengupta
Tanmoy Chakraborty
126
0
0
02 May 2025
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
Baz Roland
Kristina Malyseva
Anna Pappa
Tristan Cazenave
124
0
0
29 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
303
0
0
23 Apr 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Liu
Inesh Chakrabarti
Yixiao Li
Mengdi Wang
Tuo Zhao
Lin F. Yang
MQ
97
0
0
20 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
69
1
0
19 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
154
0
0
18 Apr 2025
1234
Next