Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.00149
Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,448 papers shown
Title
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Jaewook Lee
Yoel Park
Seulki Lee
VLM
35
1
0
07 Aug 2024
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
Angie Boggust
Venkatesh Sivaraman
Yannick Assogba
Donghao Ren
Dominik Moritz
Fred Hohman
VLM
63
3
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
58
4
0
03 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
46
0
0
01 Aug 2024
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang
Yuezhou Hu
Guohao Jian
Jun Zhu
Jianfei Chen
40
5
0
30 Jul 2024
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Seungmin Yu
Xiaodie Yi
Hayun Lee
Dongkun Shin
39
1
0
30 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
55
1
0
29 Jul 2024
Parameter-Efficient Fine-Tuning via Circular Convolution
Aochuan Chen
Jiashun Cheng
Zijing Liu
Ziqi Gao
Fugee Tsung
Yu-Feng Li
Jia Li
61
2
0
27 Jul 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
43
5
0
26 Jul 2024
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu
Benlin Liu
Jiahui Wang
Yuhao Dong
Guangyi Chen
Yongming Rao
Ranjay Krishna
Jiwen Lu
VLM
48
10
0
25 Jul 2024
Accelerating the Low-Rank Decomposed Models
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
39
0
0
24 Jul 2024
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
Ao Shen
Qiang Wang
Zhiquan Lai
Xionglve Li
Dongsheng Li
ALM
MQ
37
1
0
24 Jul 2024
MetaAug: Meta-Data Augmentation for Post-Training Quantization
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
Dinh Q. Phung
Gustavo Carneiro
Thanh-Toan Do
MQ
46
0
0
20 Jul 2024
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han
Jinglei Tang
60
1
0
19 Jul 2024
Reconstruct the Pruned Model without Any Retraining
Pingjie Wang
Ziqing Fan
Shengchao Hu
Zhe Chen
Yanfeng Wang
Yu Wang
53
1
0
18 Jul 2024
CCSRP: Robust Pruning of Spiking Neural Networks through Cooperative Coevolution
J. Reif
Jiakang Li
Songning Lai
Alexander Fay
AAML
42
0
0
18 Jul 2024
INTELLECT: Adapting Cyber Threat Detection to Heterogeneous Computing Environments
Simone Magnani
Liubov Nedoshivina
Roberto Doriguzzi-Corin
Stefano Braghin
Domenico Siracusa
61
0
0
17 Jul 2024
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Ghadeer Jaradat
M. Tolba
Ghada Alsuhli
Hani Saleh
Mahmoud Al-Qutayri
Thanos Stouraitis
Baker Mohammad
45
0
0
17 Jul 2024
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Luigi Capogrosso
Enrico Fraccaroli
Giulio Petrozziello
Francesco Setti
Samarjit Chakraborty
Franco Fummi
Marco Cristani
38
3
0
16 Jul 2024
Quality Scalable Quantization Methodology for Deep Learning on Edge
S. Khaliq
Rehan Hafiz
MQ
53
1
0
15 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
37
5
0
15 Jul 2024
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
31
1
0
12 Jul 2024
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration
Febin P. Sunny
Amin Shafiee
Abhishek Balasubramaniam
Mahdi Nikdast
S. Pasricha
74
1
0
11 Jul 2024
The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others
Daniel Sikar
Artur Garcez
Robin Bloomfield
Tillman Weyde
Kaleem Peeroo
Naman Singh
Maeve Hutchinson
Dany Laksono
Mirela Reljan-Delaney
46
2
0
10 Jul 2024
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala
Alind Khare
Animesh Agrawal
Igor Fedorov
Hugo Latapie
Myungjin Lee
Alexey Tumanov
CLL
49
0
0
08 Jul 2024
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Qizhang Feng
Siva Rajesh Kasa
Santhosh Kumar Kasa
Hyokun Yun
C. Teo
S. Bodapati
92
7
0
08 Jul 2024
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data
Eun Som Jeon
Hongjun Choi
A. Shukla
Yuan Wang
Hyunglae Lee
M. Buman
Pavan Turaga
40
3
0
07 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
47
1
0
05 Jul 2024
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
44
6
0
05 Jul 2024
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
39
0
0
05 Jul 2024
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han
Qifan Wang
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Yi Fang
Qiang Guan
Lifu Huang
Dongfang Liu
VLM
51
4
0
05 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffM
MQ
50
5
0
04 Jul 2024
Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Xiaokun Luan
Xiyue Zhang
Jingyi Wang
Meng Sun
AAML
30
0
0
04 Jul 2024
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives
Huanrui Yang
Yafeng Huang
Zhen Dong
Denis A. Gudovskiy
Tomoyuki Okuno
Yohei Nakata
Yuan Du
Kurt Keutzer
Shanghang Zhang
MQ
58
0
0
03 Jul 2024
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
Yipin Guo
Zihao Li
Yilin Lang
Qinyuan Ren
68
0
0
03 Jul 2024
Efficient DNN-Powered Software with Fair Sparse Models
Xuanqi Gao
Weipeng Jiang
Juan Zhai
Shiqing Ma
Xiaoyu Zhang
Chao Shen
55
0
0
03 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Min-man Wu
Min Wu
Xiaoli Li
Weisi Lin
ViT
VLM
56
7
0
02 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedIm
DiffM
46
1
0
01 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
65
2
0
01 Jul 2024
Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
Quanming Yao
Yongqi Zhang
Yaqing Wang
Nan Yin
James Kwok
Qiang Yang
42
0
0
29 Jun 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
55
0
0
29 Jun 2024
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for Uncertainty-Aware Dynamic Navigation
Zhanteng Xie
P. Dames
47
1
0
28 Jun 2024
Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design
Linping Qu
Yuyi Mao
Shenghui Song
Chi-Ying Tsui
51
0
0
26 Jun 2024
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization
Linping Qu
Shenghui Song
Chi-Ying Tsui
MQ
FedML
26
4
0
26 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
36
23
0
25 Jun 2024
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
A. Ramesh
Vignesh Ganapathiraman
I. Laradji
Mark Schmidt
43
1
0
25 Jun 2024
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Hongkang Li
Meng Wang
Shuai Zhang
Sijia Liu
Pin-Yu Chen
40
6
0
24 Jun 2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda
Berivan Isik
Xiangyu Qi
Sanmi Koyejo
Tsachy Weissman
Prateek Mittal
MoMe
50
15
0
24 Jun 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
36
2
0
24 Jun 2024
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
Zhe Wang
Yifei Zhu
44
1
0
23 Jun 2024
Previous
1
2
3
4
5
6
...
67
68
69
Next