Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 287 papers shown
Title
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang
Xiaolong Tu
Hongyu Ke
Huirong Chai
Dawei Chen
Kyungtae Han
0
0
0
07 Oct 2025
Accelerating Attention with Basis Decomposition
Jialin Zhao
8
0
0
02 Oct 2025
Small is Sufficient: Reducing the World AI Energy Consumption Through Model Selection
Tiago da Silva Barros
Frédéric Giroire
Ramon Aparicio-Pardo
Joanna Moulierac
4
0
0
02 Oct 2025
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
40
0
0
30 Sep 2025
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Xin Yu
Cong Xie
Ziyu Zhao
Tiantian Fan
Lingzhou Xue
Zhi-Li Zhang
0
0
0
30 Sep 2025
Differentiable Sparsity via
D
D
D
-Gating: Simple and Versatile Structured Penalization
Chris Kolb
Laetitia Frost
B. Bischl
David Rügamer
60
0
0
28 Sep 2025
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
12
0
0
27 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
16
0
0
27 Sep 2025
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
28
0
0
26 Sep 2025
Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
Shirin Alanova
Kristina Kazistova
Ekaterina Galaeva
Alina Kostromina
Vladimir Smirnov
Redko Dmitry
Alexey Dontsov
Maxim Zhelnin
Evgeny Burnaev
Egor Shvetsov
8
0
0
26 Sep 2025
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu
Xing Hu
Qiang Wu
Dawei Yang
MQ
40
0
0
24 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
16
0
0
16 Sep 2025
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Ryan Lucas
Kayhan Behdin
Zhipeng Wang
Qingquan Song
Shao Tang
Rahul Mazumder
ReLM
LRM
AI4CE
8
0
0
15 Sep 2025
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Pouria Mahdavinia
Hamed Mahdavi
Niloofar Mireshghallah
M. Mahdavi
MoMe
39
0
0
14 Sep 2025
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
Hang Guo
Yawei Li
Luca Benini
MQ
102
0
0
14 Sep 2025
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
Yixuan Tang
Yi Yang
24
0
0
13 Sep 2025
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek
Wenpeng Yin
VLM
32
0
0
08 Sep 2025
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Yue Li
Xin Yi
Dongsheng Shi
Yongyi Cui
Gerard de Melo
Xiaoling Wang
KELM
AAML
32
1
0
03 Sep 2025
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
Krishna Teja Chitty-Venkata
Sandeep Madireddy
M. Emani
V. Vishwanath
MoE
53
0
0
02 Sep 2025
Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model
Zhaofeng Zhong
Wei Yuan
Liang Qu
Tong Chen
Hao Wang
Xiangyu Zhao
Hongzhi Yin
42
0
0
29 Aug 2025
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Yao Wang
Di Liang
Minlong Peng
MoMe
126
2
0
29 Aug 2025
Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions
Nannan Huang
Haytham M. Fayek
Xiuzhen Zhang
32
0
0
25 Aug 2025
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
Weilin Cai
Le Qin
Shwai He
Junwei Cui
Ang Li
Jiayi Huang
MoE
68
0
0
25 Aug 2025
Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment
Shayan Vassef
Soorya Ram Shimegekar
Abhay Goyal
Koustuv Saha
Pi Zonooz
Navin Kumar
44
0
0
22 Aug 2025
Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining
Samiul Basir Bhuiyan
Md. Sazzad Hossain Adib
Mohammed Aman Bhuiyan
Muhammad Rafsan Kabir
Moshiur Farazi
Shafin Rahman
Nabeel Mohammed
20
0
0
18 Aug 2025
SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
Boran Zhao
Haiming Zhai
Zihang Yuan
Hetian Liu
Tian Xia
Wenzhe zhao
Pengju Ren
14
1
0
18 Aug 2025
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
Chengtao Lv
Bilang Zhang
Yang Yong
Yazhe Niu
Yushi Huang
Shiqiao Gu
Jiajun Wu
Yumeng Shi
Jinyang Guo
Wenya Wang
MLLM
VLM
24
0
0
13 Aug 2025
EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
Omar Bazarbachi
Zijun Sun
Yanning Shen
36
0
0
13 Aug 2025
READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Maxim Divilkovskiy
Vitaly Malygin
Sergey Zlobin
Sultan Isali
Vasily Kalugin
Stanislav Ilyushin
Nuriza Aitassova
Yi Fei
Zeng Weidi
RALM
24
0
0
12 Aug 2025
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin
Yixu Xu
Yue-ting Chen
C. Wang
Tao Wang
...
Zhe Wang
Hefei Guo
Hongjie Liu
Wei Lu
Zhengyong Zhang
36
0
0
12 Aug 2025
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Maksym Shamrai
Vladyslav Hamolia
24
0
0
08 Aug 2025
Pushing the Envelope of LLM Inference on AI-PC
E. Georganas
Dhiraj D. Kalamkar
Alexander Heinecke
MQ
36
0
0
08 Aug 2025
Pruning Large Language Models by Identifying and Preserving Functional Networks
Yiheng Liu
Junhao Ning
Sichen Xia
Xiaohui Gao
Ning Qiang
Bao Ge
Junwei Han
Xintao Hu
40
0
0
07 Aug 2025
Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
Haoyu Zhang
Shihao Zhang
Ian Colbert
Rayan Saab
MQ
53
2
0
06 Aug 2025
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding
Y. Zhang
Zhiyuan He
Huiqiang Jiang
Chengruidong Zhang
Yuqing Yang
Jianyong Wang
Lili Qiu
32
0
0
04 Aug 2025
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
Yuzhuang Xu
Xu Han
Yuanchi Zhang
Yixuan Wang
Yijun Liu
Shiyu Ji
Qingfu Zhu
Wanxiang Che
MoE
MQ
87
1
0
04 Aug 2025
XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
Dian Chen
Yansong Qu
Xinyang Li
Ming Li
Shengchuan Zhang
65
1
0
31 Jul 2025
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Zunhai Su
Qingyuan Li
Hao Zhang
YuLei Qian
Yuchen Xie
Kehong Yuan
MoE
64
2
0
31 Jul 2025
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
Nils Hütten
Florian Hölken
Hasan Tercan
Tobias Meisen
MedIm
45
0
0
29 Jul 2025
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
Te Zhang
Yuheng Li
Junxiang Wang
Lujun Li
48
0
0
28 Jul 2025
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
Yining Huang
Bin Li
Keke Tang
Meilian Chen
MoE
LRM
81
1
0
28 Jul 2025
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
57
0
0
24 Jul 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song
Weilin Zhao
Xu Han
Chaojun Xiao
Yingfa Chen
Yuxuan Li
Zhiyuan Liu
Maosong Sun
MoE
115
0
0
11 Jul 2025
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Ruokai Yin
Yuhang Li
Donghyun Lee
Priyadarshini Panda
VLM
35
1
0
25 Jun 2025
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Jiashun Cheng
Aochuan Chen
Nuo Chen
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
112
0
0
20 Jun 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki
Xiuyu Li
Junxian Guo
Ligeng Zhu
Chenfeng Xu
Konstantinos N. Plataniotis
Amir Yazdanbakhsh
Kurt Keutzer
Song Han
Zhijian Liu
104
1
0
19 Jun 2025
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
Yan Sun
Qixin Zhang
Zhiyuan Yu
Xikun Zhang
Li Shen
Dacheng Tao
99
1
0
15 Jun 2025
Training-free LLM Merging for Multi-task Learning
Zichuan Fu
Xian Wu
Y. X. R. Wang
Wanyu Wang
Shanshan Ye
Hongzhi Yin
Yi-Ju Chang
Yefeng Zheng
Xiangyu Zhao
MoMe
96
1
0
14 Jun 2025
Compression Aware Certified Training
Changming Xu
Gagandeep Singh
86
0
0
13 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
227
1
0
11 Jun 2025
1
2
3
4
5
6
Next