ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXiv (abs)PDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,481 papers shown
Title
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
  Pruning
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
127
311
0
10 Oct 2023
Progressive Neural Compression for Adaptive Image Offloading under
  Timing Constraints
Progressive Neural Compression for Adaptive Image Offloading under Timing Constraints
Ruiqi Wang
Hanyang Liu
Jiaming Qiu
Moran Xu
Roch Guérin
Chenyang Lu
51
3
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
66
5
0
07 Oct 2023
Extract-Transform-Load for Video Streams
Extract-Transform-Load for Video Streams
Ferdinand Kossmann
Ziniu Wu
Eugenie Lai
Nesime Tatbul
Lei Cao
Tim Kraska
Samuel Madden
70
17
0
07 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates
  before In-Context Learning
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite
LRM
105
5
0
07 Oct 2023
Model Compression in Practice: Lessons Learned from Practitioners
  Creating On-device Machine Learning Experiences
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
Fred Hohman
Mary Beth Kery
Donghao Ren
Dominik Moritz
116
19
0
06 Oct 2023
Can pruning make Large Language Models more efficient?
Can pruning make Large Language Models more efficient?
Sia Gholami
Marwan Omar
94
13
0
06 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k
  Mixture-of-Experts Conversion
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
Filip Szatkowski
Eric Elmoznino
Younesse Kaddar
Simone Scardapane
MoE
66
6
0
06 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices
Quantized Transformer Language Model Implementations on Edge Devices
Mohammad Wali Ur Rahman
Murad Mehrab Abrar
Hunter Gibbons Copening
Salim Hariri
Sicong Shao
Pratik Satam
Soheil Salehi
MQ
75
11
0
06 Oct 2023
Denoising Diffusion Step-aware Models
Denoising Diffusion Step-aware Models
Shuai Yang
Yukang Chen
Luozhou Wang
Shu Liu
Ying-Cong Chen
DiffM
147
17
0
05 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for
  Transformer Layers
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers
Yiming Wang
Jinyu Li
57
6
0
03 Oct 2023
Feather: An Elegant Solution to Effective DNN Sparsification
Feather: An Elegant Solution to Effective DNN Sparsification
Athanasios Glentis Georgoulakis
George Retsinas
Petros Maragos
61
1
0
03 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor
  Cores
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
71
17
0
03 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient
  Transformers
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers
Rickard Brannvall
58
0
0
03 Oct 2023
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training
Aochuan Chen
Yimeng Zhang
Jinghan Jia
James Diffenderfer
Jiancheng Liu
Konstantinos Parasyris
Yihua Zhang
Zheng Zhang
B. Kailkhura
Sijia Liu
150
48
0
03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
132
50
0
02 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
  Routing Policy
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
85
39
0
02 Oct 2023
Faster and Accurate Neural Networks with Semantic Inference
Faster and Accurate Neural Networks with Semantic Inference
Sazzad Sayyed
Jonathan D. Ashdown
Francesco Restuccia
80
0
0
02 Oct 2023
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral
  Fingerprinting and Secure Delegation
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure Delegation
Marco Arazzi
S. Nicolazzo
Antonino Nocera
64
10
0
02 Oct 2023
ECNR: Efficient Compressive Neural Representation of Time-Varying
  Volumetric Datasets
ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets
Kaiyuan Tang
Chaoli Wang
77
8
0
02 Oct 2023
Do Compressed LLMs Forget Knowledge? An Experimental Study with
  Practical Implications
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
Duc Hoang
Minsik Cho
Thomas Merth
Mohammad Rastegari
Zhangyang Wang
KELMCLL
93
5
0
02 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for
  Efficient Neural Network Inference using SIMD Architectures on CPUs
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
108
2
0
01 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on
  Habana Gaudi Processors
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
55
5
0
29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
88
7
0
29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for
  Mobile Devices
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile Devices
Lehao Wang
Zhiwen Yu
Haoyi Yu
Sicong Liu
Yaxiong Xie
Bin Guo
Yunxin Liu
56
5
0
27 Sep 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A
  survey
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
111
23
0
27 Sep 2023
Efficient Post-training Quantization with FP8 Formats
Efficient Post-training Quantization with FP8 Formats
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
99
23
0
26 Sep 2023
Probabilistic Weight Fixing: Large-scale training of neural network
  weight uncertainties for quantization
Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization
Christopher Subia-Waud
S. Dasmahapatra
UQCVMQ
65
1
0
24 Sep 2023
ThinResNet: A New Baseline for Structured Convolutional Networks Pruning
ThinResNet: A New Baseline for Structured Convolutional Networks Pruning
Hugo Tessier
Ghouti Boukli Hacene
Vincent Gripon
63
1
0
22 Sep 2023
RAI4IoE: Responsible AI for Enabling the Internet of Energy
RAI4IoE: Responsible AI for Enabling the Internet of Energy
Minhui Xue
Surya Nepal
Ling Liu
Subbu Sethuvenkatraman
Xingliang Yuan
Carsten Rudolph
Ruoxi Sun
Greg Eisenhauer
113
5
0
20 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
Shuaiwen Leon Song
102
15
0
19 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
74
0
0
18 Sep 2023
Training dynamic models using early exits for automatic speech
  recognition on resource-constrained devices
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
George August Wright
Umberto Cappellazzo
Salah Zaiem
Desh Raj
Lucas Ondel Yang
Daniele Falavigna
Mohamed Nabih Ali
Alessio Brutti
75
2
0
18 Sep 2023
Enhancing Quantised End-to-End ASR Models via Personalisation
Enhancing Quantised End-to-End ASR Models via Personalisation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MQ
64
3
0
17 Sep 2023
Scaling Laws for Sparsely-Connected Foundation Models
Scaling Laws for Sparsely-Connected Foundation Models
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
116
38
0
15 Sep 2023
Accelerating Deep Neural Networks via Semi-Structured Activation
  Sparsity
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Matteo Grimaldi
Darshan C. Ganji
Ivan Lazarevich
Sudhakar Sah
66
10
0
12 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in
  Remote Sensing
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing
Clifford Broni-Bediako
Junshi Xia
Naoto Yokoya
93
10
0
12 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private
  Inference
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference
Kiwan Maeng
G. E. Suh
58
2
0
09 Sep 2023
Sparse Federated Training of Object Detection in the Internet of
  Vehicles
Sparse Federated Training of Object Detection in the Internet of Vehicles
Luping Rao
Chuan Ma
Ming Ding
Yuwen Qian
Lu Zhou
Yanfeng Guo
35
2
0
07 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression
Bandwidth-efficient Inference for Neural Image Compression
Shanzhi Yin
Tongda Xu
Yongsheng Liang
Yuanyuan Wang
Yanghao Li
Yan Wang
Jingjing Liu
55
1
0
06 Sep 2023
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in
  Differential Privacy with Optimal Gaussian Noise and Application to Deep
  Learning
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep Learning
Hanshen Xiao
Jun Wan
Srini Devadas
71
8
0
06 Sep 2023
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction
  Microphones for In-Ear Sensing Platforms
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms
Philipp Schilk
Niccolò Polvani
Andrea Ronco
Milos Cernak
Michele Magno
75
12
0
05 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
87
0
0
05 Sep 2023
Efficient Defense Against Model Stealing Attacks on Convolutional Neural
  Networks
Efficient Defense Against Model Stealing Attacks on Convolutional Neural Networks
Kacem Khaled
Mouna Dhaouadi
F. Magalhães
Gabriela Nicolescu
AAML
34
2
0
04 Sep 2023
On the fly Deep Neural Network Optimization Control for Low-Power
  Computer Vision
On the fly Deep Neural Network Optimization Control for Low-Power Computer Vision
Ishmeet Kaur
Adwaita Janardhan Jadhav
51
0
0
04 Sep 2023
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
  Transformation
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
Nastaran Darabi
Maeesha Binte Hashem
Hongyi Pan
Ahmet Cetin
Wilfred Gomes
A. R. Trivedi
71
6
0
04 Sep 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Kabir Nagrecha
Arun Kumar
110
6
0
03 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large
  Language Models
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
90
7
0
02 Sep 2023
Proof of Deep Learning: Approaches, Challenges, and Future Directions
Proof of Deep Learning: Approaches, Challenges, and Future Directions
Mahmoud Salhab
Khaleel W. Mershad
73
1
0
31 Aug 2023
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Yizeng Han
Zeyu Liu
Zhihang Yuan
Yifan Pu
Chaofei Wang
Shiji Song
Gao Huang
113
23
0
30 Aug 2023
Previous
123...111213...686970
Next