Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 294 papers shown
Title
RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model
Shuichiro Haruta
Kazunori Matsumoto
Zhi Li
Yanan Wang
Mori Kurokawa
0
0
0
09 Oct 2025
SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks
Md. Kowsher
Ali O. Polat
Ehsan Mohammady Ardehaly
Mehrdad Salehi
Zia Ghiasi
Prasanth Murali
Chen Chen
0
0
0
09 Oct 2025
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Arjun Krishnakumar
R. Sukthanker
Hannan Javed Mahadik
Gabriela Kadlecová
Vladyslav Moroshan
Timur Carstensen
Frank Hutter
Aaron Klein
12
0
0
08 Oct 2025
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang
Xiaolong Tu
Hongyu Ke
Huirong Chai
Dawei Chen
Kyungtae Han
8
0
0
07 Oct 2025
Expand Neurons, Not Parameters
Linghao Kong
Inimai Subramanian
Yonadav Shavit
Micah Adler
Dan Alistarh
Nir Shavit
20
0
0
06 Oct 2025
The Curious Case of In-Training Compression of State Space Models
Makram Chahine
Philipp Nazari
Daniela Rus
T. Konstantin Rusch
8
0
0
03 Oct 2025
Accelerating Attention with Basis Decomposition
Jialin Zhao
20
0
0
02 Oct 2025
Small is Sufficient: Reducing the World AI Energy Consumption Through Model Selection
Tiago da Silva Barros
Frédéric Giroire
Ramon Aparicio-Pardo
Joanna Moulierac
20
0
0
02 Oct 2025
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Xin Yu
Cong Xie
Ziyu Zhao
Tiantian Fan
Lingzhou Xue
Zhi-Li Zhang
28
0
0
30 Sep 2025
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
68
0
0
30 Sep 2025
UniPruning: Unifying Local Metric and Global Feedback for Scalable Sparse LLMs
Yizhuo Ding
Wanying Qu
Jiawei Geng
Wenqi Shao
Yanwei Fu
12
0
0
29 Sep 2025
Differentiable Sparsity via
D
D
D
-Gating: Simple and Versatile Structured Penalization
Chris Kolb
Laetitia Frost
B. Bischl
David Rügamer
84
0
0
28 Sep 2025
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
20
0
0
27 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
25
0
0
27 Sep 2025
Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
Shirin Alanova
Kristina Kazistova
Ekaterina Galaeva
Alina Kostromina
Vladimir Smirnov
Redko Dmitry
Alexey Dontsov
Maxim Zhelnin
Evgeny Burnaev
Egor Shvetsov
20
0
0
26 Sep 2025
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
60
0
0
26 Sep 2025
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu
Xing Hu
Qiang Wu
Dawei Yang
MQ
64
0
0
24 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
32
0
0
16 Sep 2025
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Ryan Lucas
Kayhan Behdin
Zhipeng Wang
Qingquan Song
Shao Tang
Rahul Mazumder
ReLM
LRM
AI4CE
20
0
0
15 Sep 2025
Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Pouria Mahdavinia
Hamed Mahdavi
Niloofar Mireshghallah
M. Mahdavi
MoMe
59
0
0
14 Sep 2025
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
Hang Guo
Yawei Li
Luca Benini
MQ
114
0
0
14 Sep 2025
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
Yixuan Tang
Yi Yang
44
0
0
13 Sep 2025
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek
Wenpeng Yin
VLM
52
0
0
08 Sep 2025
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Yue Li
Xin Yi
Dongsheng Shi
Yongyi Cui
Gerard de Melo
Xiaoling Wang
KELM
AAML
60
1
0
03 Sep 2025
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
Krishna Teja Chitty-Venkata
Sandeep Madireddy
M. Emani
V. Vishwanath
MoE
65
0
0
02 Sep 2025
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Yao Wang
Di Liang
Minlong Peng
MoMe
150
2
0
29 Aug 2025
Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model
Zhaofeng Zhong
Wei Yuan
Liang Qu
Tong Chen
Hao Wang
Xiangyu Zhao
Hongzhi Yin
58
0
0
29 Aug 2025
Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions
Nannan Huang
Haytham M. Fayek
Xiuzhen Zhang
39
0
0
25 Aug 2025
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
Weilin Cai
Le Qin
Shwai He
Junwei Cui
Ang Li
Jiayi Huang
MoE
72
0
0
25 Aug 2025
Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment
Shayan Vassef
Soorya Ram Shimegekar
Abhay Goyal
Koustuv Saha
Pi Zonooz
Navin Kumar
72
0
0
22 Aug 2025
Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining
Samiul Basir Bhuiyan
Md. Sazzad Hossain Adib
Mohammed Aman Bhuiyan
Muhammad Rafsan Kabir
Moshiur Farazi
Shafin Rahman
Nabeel Mohammed
28
0
0
18 Aug 2025
SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
Boran Zhao
Haiming Zhai
Zihang Yuan
Hetian Liu
Tian Xia
Wenzhe zhao
Pengju Ren
26
1
0
18 Aug 2025
EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
Omar Bazarbachi
Zijun Sun
Yanning Shen
48
0
0
13 Aug 2025
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
Chengtao Lv
Bilang Zhang
Yang Yong
Yazhe Niu
Yushi Huang
Shiqiao Gu
Jiajun Wu
Yumeng Shi
Jinyang Guo
Wenya Wang
MLLM
VLM
35
0
0
13 Aug 2025
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin
Yixu Xu
Yue-ting Chen
C. Wang
Tao Wang
...
Zhe Wang
Hefei Guo
Hongjie Liu
Wei Lu
Zhengyong Zhang
56
0
0
12 Aug 2025
READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Maxim Divilkovskiy
Vitaly Malygin
Sergey Zlobin
Sultan Isali
Vasily Kalugin
Stanislav Ilyushin
Nuriza Aitassova
Yi Fei
Zeng Weidi
RALM
40
0
0
12 Aug 2025
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Maksym Shamrai
Vladyslav Hamolia
32
0
0
08 Aug 2025
Pushing the Envelope of LLM Inference on AI-PC
E. Georganas
Dhiraj D. Kalamkar
Alexander Heinecke
MQ
44
0
0
08 Aug 2025
Pruning Large Language Models by Identifying and Preserving Functional Networks
Yiheng Liu
Junhao Ning
Sichen Xia
Xiaohui Gao
Ning Qiang
Bao Ge
Junwei Han
Xintao Hu
56
0
0
07 Aug 2025
Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
Haoyu Zhang
Shihao Zhang
Ian Colbert
Rayan Saab
MQ
69
2
0
06 Aug 2025
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
Yuzhuang Xu
Xu Han
Yuanchi Zhang
Yixuan Wang
Yijun Liu
Shiyu Ji
Qingfu Zhu
Wanxiang Che
MoE
MQ
119
1
0
04 Aug 2025
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding
Y. Zhang
Zhiyuan He
Huiqiang Jiang
Chengruidong Zhang
Yuqing Yang
Jianyong Wang
Lili Qiu
44
0
0
04 Aug 2025
XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
Dian Chen
Yansong Qu
Xinyang Li
Ming Li
Shengchuan Zhang
85
1
0
31 Jul 2025
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Zunhai Su
Qingyuan Li
Hao Zhang
YuLei Qian
Yuchen Xie
Kehong Yuan
MoE
76
2
0
31 Jul 2025
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
Nils Hütten
Florian Hölken
Hasan Tercan
Tobias Meisen
MedIm
50
0
0
29 Jul 2025
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
Te Zhang
Yuheng Li
Junxiang Wang
Lujun Li
56
0
0
28 Jul 2025
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
Yining Huang
Bin Li
Keke Tang
Meilian Chen
MoE
LRM
95
1
0
28 Jul 2025
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
Qingcheng Zhu
Yangyang Ren
L. Yang
Mingbao Lin
Yanjing Li
...
Haodong Zhu
Yuguang Yang
Juan Zhang
Runqi Wang
Baochang Zhang
MQ
57
0
0
24 Jul 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song
Weilin Zhao
Xu Han
Chaojun Xiao
Yingfa Chen
Yuxuan Li
Zhiyuan Liu
Maosong Sun
MoE
139
0
0
11 Jul 2025
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Ruokai Yin
Yuhang Li
Donghyun Lee
Priyadarshini Panda
VLM
55
1
0
25 Jun 2025
1
2
3
4
5
6
Next