ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00774
  4. Cited By
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

2 January 2023
Elias Frantar
Dan Alistarh
    VLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 287 papers shown
Title
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang
Xiaolong Tu
Hongyu Ke
Huirong Chai
Dawei Chen
Kyungtae Han
0
0
0
07 Oct 2025
Accelerating Attention with Basis Decomposition
Accelerating Attention with Basis Decomposition
Jialin Zhao
8
0
0
02 Oct 2025
Small is Sufficient: Reducing the World AI Energy Consumption Through Model Selection
Small is Sufficient: Reducing the World AI Energy Consumption Through Model Selection
Tiago da Silva Barros
Frédéric Giroire
Ramon Aparicio-Pardo
Joanna Moulierac
4
0
0
02 Oct 2025
Layer-wise dynamic rank for compressing large language models
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
40
0
0
30 Sep 2025
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Xin Yu
Cong Xie
Ziyu Zhao
Tiantian Fan
Lingzhou Xue
Zhi-Li Zhang
0
0
0
30 Sep 2025
Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization
Differentiable Sparsity via DDD-Gating: Simple and Versatile Structured Penalization
Chris Kolb
Laetitia Frost
B. Bischl
David Rügamer
60
0
0
28 Sep 2025
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
12
0
0
27 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
16
0
0
27 Sep 2025
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
28
0
0
26 Sep 2025
Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
Shirin Alanova
Kristina Kazistova
Ekaterina Galaeva
Alina Kostromina
Vladimir Smirnov
Redko Dmitry
Alexey Dontsov
Maxim Zhelnin
Evgeny Burnaev
Egor Shvetsov
8
0
0
26 Sep 2025
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu
Xing Hu
Qiang Wu
Dawei Yang
MQ
40
0
0
24 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
16
0
0
16 Sep 2025
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Ryan Lucas
Kayhan Behdin
Zhipeng Wang
Qingquan Song
Shao Tang
Rahul Mazumder
ReLMLRMAI4CE
8
0
0
15 Sep 2025
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Pouria Mahdavinia
Hamed Mahdavi
Niloofar Mireshghallah
M. Mahdavi
MoMe
39
0
0
14 Sep 2025
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
Hang Guo
Yawei Li
Luca Benini
MQ
102
0
0
14 Sep 2025
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
Yixuan Tang
Yi Yang
24
0
0
13 Sep 2025
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek
Wenpeng Yin
VLM
32
0
0
08 Sep 2025
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Yue Li
Xin Yi
Dongsheng Shi
Yongyi Cui
Gerard de Melo
Xiaoling Wang
KELMAAML
32
1
0
03 Sep 2025
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
Krishna Teja Chitty-Venkata
Sandeep Madireddy
M. Emani
V. Vishwanath
MoE
53
0
0
02 Sep 2025
Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model
Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model
Zhaofeng Zhong
Wei Yuan
Liang Qu
Tong Chen
Hao Wang
Xiangyu Zhao
Hongzhi Yin
42
0
0
29 Aug 2025
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Yao Wang
Di Liang
Minlong Peng
MoMe
126
2
0
29 Aug 2025
Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions
Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions
Nannan Huang
Haytham M. Fayek
Xiuzhen Zhang
32
0
0
25 Aug 2025
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
Weilin Cai
Le Qin
Shwai He
Junwei Cui
Ang Li
Jiayi Huang
MoE
68
0
0
25 Aug 2025
Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment
Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment
Shayan Vassef
Soorya Ram Shimegekar
Abhay Goyal
Koustuv Saha
Pi Zonooz
Navin Kumar
44
0
0
22 Aug 2025
Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining
Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining
Samiul Basir Bhuiyan
Md. Sazzad Hossain Adib
Mohammed Aman Bhuiyan
Muhammad Rafsan Kabir
Moshiur Farazi
Shafin Rahman
Nabeel Mohammed
20
0
0
18 Aug 2025
SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
Boran Zhao
Haiming Zhai
Zihang Yuan
Hetian Liu
Tian Xia
Wenzhe zhao
Pengju Ren
14
1
0
18 Aug 2025
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
Chengtao Lv
Bilang Zhang
Yang Yong
Yazhe Niu
Yushi Huang
Shiqiao Gu
Jiajun Wu
Yumeng Shi
Jinyang Guo
Wenya Wang
MLLMVLM
24
0
0
13 Aug 2025
EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
Omar Bazarbachi
Zijun Sun
Yanning Shen
36
0
0
13 Aug 2025
READER: Retrieval-Assisted Drafter for Efficient LLM Inference
READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Maxim Divilkovskiy
Vitaly Malygin
Sergey Zlobin
Sultan Isali
Vasily Kalugin
Stanislav Ilyushin
Nuriza Aitassova
Yi Fei
Zeng Weidi
RALM
24
0
0
12 Aug 2025
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin
Yixu Xu
Yue-ting Chen
C. Wang
Tao Wang
...
Zhe Wang
Hefei Guo
Hongjie Liu
Wei Lu
Zhengyong Zhang
36
0
0
12 Aug 2025
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Maksym Shamrai
Vladyslav Hamolia
24
0
0
08 Aug 2025
Pushing the Envelope of LLM Inference on AI-PC
Pushing the Envelope of LLM Inference on AI-PC
E. Georganas
Dhiraj D. Kalamkar
Alexander Heinecke
MQ
36
0
0
08 Aug 2025
Pruning Large Language Models by Identifying and Preserving Functional Networks
Pruning Large Language Models by Identifying and Preserving Functional Networks
Yiheng Liu
Junhao Ning
Sichen Xia
Xiaohui Gao
Ning Qiang
Bao Ge
Junwei Han
Xintao Hu
40
0
0
07 Aug 2025
Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
Haoyu Zhang
Shihao Zhang
Ian Colbert
Rayan Saab
MQ
53
2
0
06 Aug 2025
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding
Y. Zhang
Zhiyuan He
Huiqiang Jiang
Chengruidong Zhang
Yuqing Yang
Jianyong Wang
Lili Qiu
32
0
0
04 Aug 2025
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
Yuzhuang Xu
Xu Han
Yuanchi Zhang
Yixuan Wang
Yijun Liu
Shiyu Ji
Qingfu Zhu
Wanxiang Che
MoEMQ
87
1
0
04 Aug 2025
XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
Dian Chen
Yansong Qu
Xinyang Li
Ming Li
Shengchuan Zhang
65
1
0
31 Jul 2025
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Zunhai Su
Qingyuan Li
Hao Zhang
YuLei Qian
Yuchen Xie
Kehong Yuan
MoE
64
2
0
31 Jul 2025
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
Nils Hütten
Florian Hölken
Hasan Tercan
Tobias Meisen
MedIm
45
0
0
29 Jul 2025
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
Te Zhang
Yuheng Li
Junxiang Wang
Lujun Li
48
0
0
28 Jul 2025
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
Yining Huang
Bin Li
Keke Tang
Meilian Chen
MoELRM
81
1
0
28 Jul 2025
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
57
0
0
24 Jul 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song
Weilin Zhao
Xu Han
Chaojun Xiao
Yingfa Chen
Yuxuan Li
Zhiyuan Liu
Maosong Sun
MoE
115
0
0
11 Jul 2025
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Ruokai Yin
Yuhang Li
Donghyun Lee
Priyadarshini Panda
VLM
35
1
0
25 Jun 2025
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Jiashun Cheng
Aochuan Chen
Nuo Chen
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
112
0
0
20 Jun 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki
Xiuyu Li
Junxian Guo
Ligeng Zhu
Chenfeng Xu
Konstantinos N. Plataniotis
Amir Yazdanbakhsh
Kurt Keutzer
Song Han
Zhijian Liu
104
1
0
19 Jun 2025
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
Yan Sun
Qixin Zhang
Zhiyuan Yu
Xikun Zhang
Li Shen
Dacheng Tao
99
1
0
15 Jun 2025
Training-free LLM Merging for Multi-task Learning
Training-free LLM Merging for Multi-task Learning
Zichuan Fu
Xian Wu
Y. X. R. Wang
Wanyu Wang
Shanshan Ye
Hongzhi Yin
Yi-Ju Chang
Yefeng Zheng
Xiangyu Zhao
MoMe
96
1
0
14 Jun 2025
Compression Aware Certified Training
Compression Aware Certified Training
Changming Xu
Gagandeep Singh
86
0
0
13 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
227
1
0
11 Jun 2025
123456
Next