ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXivPDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,448 papers shown
Title
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Jaewook Lee
Yoel Park
Seulki Lee
VLM
35
1
0
07 Aug 2024
Compress and Compare: Interactively Evaluating Efficiency and Behavior
  Across ML Model Compression Experiments
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
Angie Boggust
Venkatesh Sivaraman
Yannick Assogba
Donghao Ren
Dominik Moritz
Fred Hohman
VLM
63
3
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
58
4
0
03 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
46
0
0
01 Aug 2024
Pruning Large Language Models with Semi-Structural Adaptive Sparse
  Training
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang
Yuezhou Hu
Guohao Jian
Jun Zhu
Jianfei Chen
40
5
0
30 Jul 2024
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Seungmin Yu
Xiaodie Yi
Hayun Lee
Dongkun Shin
39
1
0
30 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
55
1
0
29 Jul 2024
Parameter-Efficient Fine-Tuning via Circular Convolution
Parameter-Efficient Fine-Tuning via Circular Convolution
Aochuan Chen
Jiashun Cheng
Zijing Liu
Ziqi Gao
Fugee Tsung
Yu-Feng Li
Jia Li
61
2
0
27 Jul 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for
  LLMs Without Retraining
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
43
5
0
26 Jul 2024
Efficient Inference of Vision Instruction-Following Models with Elastic
  Cache
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu
Benlin Liu
Jiahui Wang
Yuhao Dong
Guangyi Chen
Yongming Rao
Ranjay Krishna
Jiwen Lu
VLM
48
10
0
25 Jul 2024
Accelerating the Low-Rank Decomposed Models
Accelerating the Low-Rank Decomposed Models
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
39
0
0
24 Jul 2024
Accurate and Efficient Fine-Tuning of Quantized Large Language Models
  Through Optimal Balance
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
Ao Shen
Qiang Wang
Zhiquan Lai
Xionglve Li
Dongsheng Li
ALM
MQ
37
1
0
24 Jul 2024
MetaAug: Meta-Data Augmentation for Post-Training Quantization
MetaAug: Meta-Data Augmentation for Post-Training Quantization
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
Dinh Q. Phung
Gustavo Carneiro
Thanh-Toan Do
MQ
46
0
0
20 Jul 2024
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han
Jinglei Tang
60
1
0
19 Jul 2024
Reconstruct the Pruned Model without Any Retraining
Reconstruct the Pruned Model without Any Retraining
Pingjie Wang
Ziqing Fan
Shengchao Hu
Zhe Chen
Yanfeng Wang
Yu Wang
53
1
0
18 Jul 2024
CCSRP: Robust Pruning of Spiking Neural Networks through Cooperative
  Coevolution
CCSRP: Robust Pruning of Spiking Neural Networks through Cooperative Coevolution
J. Reif
Jiakang Li
Songning Lai
Alexander Fay
AAML
42
0
0
18 Jul 2024
INTELLECT: Adapting Cyber Threat Detection to Heterogeneous Computing
  Environments
INTELLECT: Adapting Cyber Threat Detection to Heterogeneous Computing Environments
Simone Magnani
Liubov Nedoshivina
Roberto Doriguzzi-Corin
Stefano Braghin
Domenico Siracusa
61
0
0
17 Jul 2024
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Ghadeer Jaradat
M. Tolba
Ghada Alsuhli
Hani Saleh
Mahmoud Al-Qutayri
Thanos Stouraitis
Baker Mohammad
45
0
0
17 Jul 2024
Enhancing Split Computing and Early Exit Applications through Predefined
  Sparsity
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Luigi Capogrosso
Enrico Fraccaroli
Giulio Petrozziello
Francesco Setti
Samarjit Chakraborty
Franco Fummi
Marco Cristani
38
3
0
16 Jul 2024
Quality Scalable Quantization Methodology for Deep Learning on Edge
Quality Scalable Quantization Methodology for Deep Learning on Edge
S. Khaliq
Rehan Hafiz
MQ
53
1
0
15 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
37
5
0
15 Jul 2024
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
31
1
0
12 Jul 2024
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network
  Acceleration
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration
Febin P. Sunny
Amin Shafiee
Abhishek Balasubramaniam
Mahdi Nikdast
S. Pasricha
74
1
0
11 Jul 2024
The Misclassification Likelihood Matrix: Some Classes Are More Likely To
  Be Misclassified Than Others
The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others
Daniel Sikar
Artur Garcez
Robin Bloomfield
Tillman Weyde
Kaleem Peeroo
Naman Singh
Maeve Hutchinson
Dany Laksono
Mirela Reljan-Delaney
46
2
0
10 Jul 2024
DεpS: Delayed ε-Shrinking for Faster Once-For-All
  Training
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala
Alind Khare
Animesh Agrawal
Igor Fedorov
Hugo Latapie
Myungjin Lee
Alexey Tumanov
CLL
49
0
0
08 Jul 2024
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Qizhang Feng
Siva Rajesh Kasa
Santhosh Kumar Kasa
Hyokun Yun
C. Teo
S. Bodapati
92
7
0
08 Jul 2024
Topological Persistence Guided Knowledge Distillation for Wearable
  Sensor Data
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data
Eun Som Jeon
Hongjun Choi
A. Shukla
Yuan Wang
Hyunglae Lee
M. Buman
Pavan Turaga
40
3
0
07 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning
  Models
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
47
1
0
05 Jul 2024
Isomorphic Pruning for Vision Models
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
44
6
0
05 Jul 2024
ISQuant: apply squant to the real deployment
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
39
0
0
05 Jul 2024
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han
Qifan Wang
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Yi Fang
Qiang Guan
Lifu Huang
Dongfang Liu
VLM
51
4
0
05 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffM
MQ
50
5
0
04 Jul 2024
Protecting Deep Learning Model Copyrights with Adversarial Example-Free
  Reuse Detection
Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Xiaokun Luan
Xiyue Zhang
Jingyi Wang
Meng Sun
AAML
30
0
0
04 Jul 2024
Fisher-aware Quantization for DETR Detectors with Critical-category
  Objectives
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives
Huanrui Yang
Yafeng Huang
Zhen Dong
Denis A. Gudovskiy
Tomoyuki Okuno
Yohei Nakata
Yuan Du
Kurt Keutzer
Shanghang Zhang
MQ
58
0
0
03 Jul 2024
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid
  Computation
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
Yipin Guo
Zihao Li
Yilin Lang
Qinyuan Ren
68
0
0
03 Jul 2024
Efficient DNN-Powered Software with Fair Sparse Models
Efficient DNN-Powered Software with Fair Sparse Models
Xuanqi Gao
Weipeng Jiang
Juan Zhai
Shiqing Ma
Xiaoyu Zhang
Chao Shen
55
0
0
03 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Min-man Wu
Min Wu
Xiaoli Li
Weisi Lin
ViT
VLM
56
7
0
02 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedIm
DiffM
46
1
0
01 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
65
2
0
01 Jul 2024
Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
Quanming Yao
Yongqi Zhang
Yaqing Wang
Nan Yin
James Kwok
Qiang Yang
42
0
0
29 Jun 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
55
0
0
29 Jun 2024
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for
  Uncertainty-Aware Dynamic Navigation
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for Uncertainty-Aware Dynamic Navigation
Zhanteng Xie
P. Dames
47
1
0
28 Jun 2024
Energy-Efficient Channel Decoding for Wireless Federated Learning:
  Convergence Analysis and Adaptive Design
Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design
Linping Qu
Yuyi Mao
Shenghui Song
Chi-Ying Tsui
51
0
0
26 Jun 2024
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink
  and Downlink Adaptive Quantization
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization
Linping Qu
Shenghui Song
Chi-Ying Tsui
MQ
FedML
26
4
0
26 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
36
23
0
25 Jun 2024
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and
  Optimizing the Right Coordinate Blocks
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
A. Ramesh
Vignesh Ganapathiraman
I. Laradji
Mark Schmidt
43
1
0
25 Jun 2024
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer
  Analysis
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Hongkang Li
Meng Wang
Shuai Zhang
Sijia Liu
Pin-Yu Chen
40
6
0
24 Jun 2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda
Berivan Isik
Xiangyu Qi
Sanmi Koyejo
Tsachy Weissman
Prateek Mittal
MoMe
50
15
0
24 Jun 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
36
2
0
24 Jun 2024
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A
  Measurement Study
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
Zhe Wang
Yifei Zhu
44
1
0
23 Jun 2024
Previous
123456...676869
Next