v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015

Song Han

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,481 papers shown

Title
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning Mengzhou Xia Tianyu Gao Zhiyuan Zeng Danqi Chen 127 311 0 10 Oct 2023
Progressive Neural Compression for Adaptive Image Offloading under Timing Constraints Ruiqi Wang Hanyang Liu Jiaming Qiu Moran Xu Roch Guérin Chenyang Lu 51 3 0 08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM Luoming Zhang Wen Fei Weijia Wu Yefei He Zhenyu Lou Hong Zhou MQ 66 5 0 07 Oct 2023
Extract-Transform-Load for Video Streams Ferdinand Kossmann Ziniu Wu Eugenie Lai Nesime Tatbul Lei Cao Tim Kraska Samuel Madden 70 17 0 07 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning Tian Jin Nolan Clement Xin Dong Vaishnavh Nagarajan Michael Carbin Jonathan Ragan-Kelley Gintare Karolina Dziugaite LRM 105 5 0 07 Oct 2023
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences Fred Hohman Mary Beth Kery Donghao Ren Dominik Moritz 116 19 0 06 Oct 2023
Can pruning make Large Language Models more efficient? Sia Gholami Marwan Omar 94 13 0 06 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion Filip Szatkowski Eric Elmoznino Younesse Kaddar Simone Scardapane MoE 66 6 0 06 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices Mohammad Wali Ur Rahman Murad Mehrab Abrar Hunter Gibbons Copening Salim Hariri Sicong Shao Pratik Satam Soheil Salehi MQ 75 11 0 06 Oct 2023
Denoising Diffusion Step-aware Models Shuai Yang Yukang Chen Luozhou Wang Shu Liu Ying-Cong Chen DiffM 147 17 0 05 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers Yiming Wang Jinyu Li 57 6 0 03 Oct 2023
Feather: An Elegant Solution to Effective DNN Sparsification Athanasios Glentis Georgoulakis George Retsinas Petros Maragos 61 1 0 03 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores Roberto L. Castro Andrei Ivanov Diego Andrade Tal Ben-Nun B. Fraguela Torsten Hoefler 71 17 0 03 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers Rickard Brannvall 58 0 0 03 Oct 2023
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training Aochuan Chen Yimeng Zhang Jinghan Jia James Diffenderfer Jiancheng Liu Konstantinos Parasyris Yihua Zhang Zheng Zhang B. Kailkhura Sijia Liu 150 48 0 03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple Ajay Jaiswal Zhe Gan Xianzhi Du Bowen Zhang Zhangyang Wang Yinfei Yang MQ 132 50 0 02 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy Pingzhi Li Zhenyu Zhang Prateek Yadav Yi-Lin Sung Yu Cheng Mohit Bansal Tianlong Chen MoMe 85 39 0 02 Oct 2023
Faster and Accurate Neural Networks with Semantic Inference Sazzad Sayyed Jonathan D. Ashdown Francesco Restuccia 80 0 0 02 Oct 2023
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure Delegation Marco Arazzi S. Nicolazzo Antonino Nocera 64 10 0 02 Oct 2023
ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets Kaiyuan Tang Chaoli Wang 77 8 0 02 Oct 2023
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications Duc Hoang Minsik Cho Thomas Merth Mohammad Rastegari Zhangyang Wang KELM CLL 93 5 0 02 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs Cyrus Zhou Zack Hassman Ruize Xu Dhirpal Shah Vaughn Richard Yanjing Li 108 2 0 01 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors Chengming Zhang Baixi Sun Xiaodong Yu Zhen Xie Weijian Zheng K. Iskra Pete Beckman Dingwen Tao 55 5 0 29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs Lu Yin Ajay Jaiswal Shiwei Liu Souvik Kundu Zhangyang Wang 88 7 0 29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile Devices Lehao Wang Zhiwen Yu Haoyi Yu Sicong Liu Yaxiong Xie Bin Guo Yunxin Liu 56 5 0 27 Sep 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey Sicong Liu Bin Guo Cheng Fang Ziqi Wang Shiyan Luo Zimu Zhou Zhiwen Yu AI4CE 111 23 0 27 Sep 2023
Efficient Post-training Quantization with FP8 Formats Haihao Shen Naveen Mellempudi Xin He Q. Gao Chang‐Bao Wang Mengni Wang MQ 99 23 0 26 Sep 2023
Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization Christopher Subia-Waud S. Dasmahapatra UQCV MQ 65 1 0 24 Sep 2023
ThinResNet: A New Baseline for Structured Convolutional Networks Pruning Hugo Tessier Ghouti Boukli Hacene Vincent Gripon 63 1 0 22 Sep 2023
RAI4IoE: Responsible AI for Enabling the Internet of Energy Minhui Xue Surya Nepal Ling Liu Subbu Sethuvenkatraman Xingliang Yuan Carsten Rudolph Ruoxi Sun Greg Eisenhauer 113 5 0 20 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity Haojun Xia Zhen Zheng Yuchao Li Donglin Zhuang Zhongzhu Zhou Xiafei Qiu Yong Li Wei Lin Shuaiwen Leon Song 102 15 0 19 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling Ziming Wang Shumin Han Xiaodi Wang Jing Hao Xianbin Cao Baochang Zhang VLM 74 0 0 18 Sep 2023
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices George August Wright Umberto Cappellazzo Salah Zaiem Desh Raj Lucas Ondel Yang Daniele Falavigna Mohamed Nabih Ali Alessio Brutti 75 2 0 18 Sep 2023
Enhancing Quantised End-to-End ASR Models via Personalisation Qiuming Zhao Guangzhi Sun Chao Zhang Mingxing Xu Thomas Fang Zheng MQ 64 3 0 17 Sep 2023
Scaling Laws for Sparsely-Connected Foundation Models Elias Frantar C. Riquelme N. Houlsby Dan Alistarh Utku Evci 116 38 0 15 Sep 2023
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity Matteo Grimaldi Darshan C. Ganji Ivan Lazarevich Sudhakar Sah 66 10 0 12 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing Clifford Broni-Bediako Junshi Xia Naoto Yokoya 93 10 0 12 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference Kiwan Maeng G. E. Suh 58 2 0 09 Sep 2023
Sparse Federated Training of Object Detection in the Internet of Vehicles Luping Rao Chuan Ma Ming Ding Yuwen Qian Lu Zhou Yanfeng Guo 35 2 0 07 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression Shanzhi Yin Tongda Xu Yongsheng Liang Yuanyuan Wang Yanghao Li Yan Wang Jingjing Liu 55 1 0 06 Sep 2023
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep Learning Hanshen Xiao Jun Wan Srini Devadas 71 8 0 06 Sep 2023
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms Philipp Schilk Niccolò Polvani Andrea Ronco Milos Cernak Michele Magno 75 12 0 05 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks Wei Huang Haotong Qin Yangdong Liu Jingzhuo Liang Yifu Ding Ying Li Xianglong Liu MQ 87 0 0 05 Sep 2023
Efficient Defense Against Model Stealing Attacks on Convolutional Neural Networks Kacem Khaled Mouna Dhaouadi F. Magalhães Gabriela Nicolescu AAML 34 2 0 04 Sep 2023
On the fly Deep Neural Network Optimization Control for Low-Power Computer Vision Ishmeet Kaur Adwaita Janardhan Jadhav 51 0 0 04 Sep 2023
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation Nastaran Darabi Maeesha Binte Hashem Hongyi Pan Ahmet Cetin Wilfred Gomes A. R. Trivedi 71 6 0 04 Sep 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads Kabir Nagrecha Arun Kumar 110 6 0 03 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models Minsik Cho Keivan Alizadeh Vahid Qichen Fu Saurabh N. Adya C. C. D. Mundo Mohammad Rastegari Devang Naik Peter Zatloukal MQ 90 7 0 02 Sep 2023
Proof of Deep Learning: Approaches, Challenges, and Future Directions Mahmoud Salhab Khaleel W. Mershad 73 1 0 31 Aug 2023
Latency-aware Unified Dynamic Networks for Efficient Image Recognition Yizeng Han Zeyu Liu Zhihang Yuan Yifan Pu Chaofei Wang Shiji Song Gao Huang 113 23 0 30 Aug 2023