Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.00149
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,481 papers shown
Title
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Adib Hasan
Ileana Rugina
Alex Wang
AAML
96
24
0
19 Jan 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
82
3
0
19 Jan 2024
SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression
Ho Fung Tsoi
Vladimir Loncar
S. Dasu
Philip C. Harris
241
4
0
18 Jan 2024
DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning
Lixiang Han
Zhen Xiao
Zhenjiang Li
84
6
0
17 Jan 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
81
3
0
16 Jan 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
125
12
0
16 Jan 2024
Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning
Manish Sharma
Jamison Heard
Eli Saber
Panos P. Markopoulos
86
1
0
15 Jan 2024
Activations and Gradients Compression for Model-Parallel Training
Mikhail Rudakov
Aleksandr Beznosikov
Yaroslav Kholodov
Alexander Gasnikov
100
2
0
15 Jan 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Namjoon Suh
Guang Cheng
MedIm
109
14
0
14 Jan 2024
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
Ji Liu
Dehua Tang
Yuanxian Huang
Li Zhang
Xiaocheng Zeng
...
Jinzhang Peng
Yu Wang
Fan Jiang
Lu Tian
Ashish Sirasao
ViT
67
8
0
12 Jan 2024
Body-Area Capacitive or Electric Field Sensing for Human Activity Recognition and Human-Computer Interaction: A Comprehensive Survey
Sizhen Bian
Mengxi Liu
Bo Zhou
P. Lukowicz
Michele Magno
100
12
0
11 Jan 2024
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Hyogon Ryu
Seohyun Lim
Hyunjung Shim
DiffM
MQ
67
7
0
09 Jan 2024
A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models
Rui-ya Ma
Qiang Zhou
Yizhu Jin
Daquan Zhou
Bangjun Xiao
...
Jingtong Hu
Xiaodong Xie
Zhen Dong
Shanghang Zhang
Shiji Zhou
108
2
0
04 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
73
11
0
03 Jan 2024
Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment
Jie Zhu
Leye Wang
Xiao Han
Anmin Liu
Tao Xie
AAML
117
6
0
02 Jan 2024
One-Shot Multi-Rate Pruning of Graph Convolutional Networks
H. Sahbi
64
0
0
29 Dec 2023
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
76
1
0
27 Dec 2023
Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks
Juyoung Yun
74
1
0
26 Dec 2023
Fairness-Aware Structured Pruning in Transformers
A. Zayed
Gonçalo Mordido
Samira Shabanian
Ioana Baldini
Sarath Chandar
72
19
0
24 Dec 2023
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
79
11
0
23 Dec 2023
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching
Pengmiao Zhang
Neelesh Gupta
Rajgopal Kannan
Viktor K. Prasanna
78
3
0
23 Dec 2023
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention
Zhen Tan
Tianlong Chen
Zhenyu Zhang
Huan Liu
98
17
0
22 Dec 2023
Resource-Limited Automated Ki67 Index Estimation in Breast Cancer
J. Gliozzo
Giosuè Cataldo Marinò
A. Bonometti
Marco Frasca
Dario Malchiodi
57
0
0
22 Dec 2023
Sparse Training for Federated Learning with Regularized Error Correction
Ran Greidi
Kobi Cohen
FedML
132
2
0
21 Dec 2023
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Eldar Kurtic
Torsten Hoefler
Dan Alistarh
72
3
0
21 Dec 2023
Model-Based Control with Sparse Neural Dynamics
Ziang Liu
Genggeng Zhou
Jeff He
Tobia Marcucci
Fei-Fei Li
Jiajun Wu
Yunzhu Li
AI4CE
92
18
0
20 Dec 2023
Towards Efficient Verification of Quantized Neural Networks
Pei Huang
Haoze Wu
Yuting Yang
Ieva Daukantas
Min Wu
Yedi Zhang
Clark W. Barrett
MQ
86
12
0
20 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
116
61
0
19 Dec 2023
Optimizing Dense Feed-Forward Neural Networks
Luis Balderas
Miguel Lastra
José M. Benítez
87
5
0
16 Dec 2023
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
184
136
0
16 Dec 2023
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang
Qizhe Zhang
Zijun Gao
Renrui Zhang
Ekaterina Shutova
Shiji Zhou
Shanghang Zhang
129
21
0
15 Dec 2023
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators
Tianyi Chen
Tianyu Ding
Zhihui Zhu
Zeyu Chen
HsiangTao Wu
Ilya Zharkov
Luming Liang
65
4
0
15 Dec 2023
Balanced and Deterministic Weight-sharing Helps Network Performance
Oscar Chang
Hod Lipson
32
0
0
13 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
148
17
0
13 Dec 2023
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means
Sean Jaffe
Ambuj K. Singh
Francesco Bullo
MQ
69
0
0
12 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
137
130
0
12 Dec 2023
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang
Siqi Li
Junhao Chen
Zhuangzhi Chen
Tianxin Huang
Linpeng Peng
Yong-Jin Liu
60
0
0
12 Dec 2023
Measurement-driven neural-network training for integrated magnetic tunnel junction arrays
W. A. Borders
A. Madhavan
M. Daniels
Vasileia Georgiou
Martin Lueker-Boden
Tiffany S. Santos
Patrick M. Braganca
M. D. Stiles
Jabez J. McClelland
Brian D. Hoskins
74
3
0
11 Dec 2023
Sense, Predict, Adapt, Repeat: A Blueprint for Design of New Adaptive AI-Centric Sensing Systems
S. Hor
Amin Arbabian
69
2
0
11 Dec 2023
FP8-BERT: Post-Training Quantization for Transformer
Jianwei Li
Tianchi Zhang
Ian En-Hsu Yen
Dongkuan Xu
MQ
69
5
0
10 Dec 2023
ESPN: Memory-Efficient Multi-Vector Information Retrieval
Susav Shrestha
Narasimha Reddy
Zongwang Li
76
7
0
09 Dec 2023
A Masked Pruning Approach for Dimensionality Reduction in Communication-Efficient Federated Learning Systems
Tamir L. S. Gez
Kobi Cohen
62
3
0
06 Dec 2023
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit
Fanfei Meng
Lele Zhang
Yu Chen
Yuxin Wang
69
10
0
05 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
152
26
0
03 Dec 2023
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models
Srinath Namburi
Makesh Narsimhan Sreedhar
Srinath Srinivasan
Frederic Sala
MQ
77
11
0
01 Dec 2023
A Compact Implicit Neural Representation for Efficient Storage of Massive 4D Functional Magnetic Resonance Imaging
Ruoran Li
Runzhao Yang
Wenxin Xiang
Yuxiao Cheng
Tingxiong Xiao
J. Suo
AI4CE
84
0
0
30 Nov 2023
Towards Higher Ranks via Adversarial Weight Pruning
Yuchuan Tian
Hanting Chen
Tianyu Guo
Chao Xu
Yunhe Wang
75
2
0
29 Nov 2023
Relationship between Model Compression and Adversarial Robustness: A Review of Current Evidence
Svetlana Pavlitska
Hannes Grolig
J. Marius Zöllner
AAML
140
3
0
27 Nov 2023
BinaryHPE: 3D Human Pose and Shape Estimation via Binarization
Zhiteng Li
Yulun Zhang
Jing Lin
Haotong Qin
Jinjin Gu
Xin Yuan
Linghe Kong
Xiaokang Yang
3DH
139
1
0
24 Nov 2023
When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence
Benoît Coqueret
Mathieu Carbone
Olivier Sentieys
Gabriel Zaid
93
2
0
23 Nov 2023
Previous
1
2
3
...
9
10
11
...
68
69
70
Next