Deep learning model compression using network sensitivity and gradients

11 October 2022

Papers citing "Deep learning model compression using network sensitivity and gradients"

18 / 18 papers shown

Title
Network Quantization with Element-wise Gradient Scaling Junghyup Lee Dohyung Kim Bumsub Ham MQ 79 120 0 02 Apr 2021
EfficientNetV2: Smaller Models and Faster Training Mingxing Tan Quoc V. Le EgoV 137 2,730 0 01 Apr 2021
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks Julieta Martinez Jashan Shewakramani Ting Liu Ioan Andrei Bârsan Wenyuan Zeng R. Urtasun MQ 74 31 0 29 Oct 2020
PROFIT: A Novel Training Method for sub-4-bit MobileNet Models Eunhyeok Park S. Yoo MQ 52 85 0 11 Aug 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference Ali Hadi Zadeh Isak Edo Omar Mohamed Awad Andreas Moshovos MQ 67 188 0 08 May 2020
Training with Quantization Noise for Extreme Model Compression Angela Fan Pierre Stock Benjamin Graham Edouard Grave Remi Gribonval Hervé Jégou Armand Joulin MQ 106 246 0 15 Apr 2020
Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks Yuhang Li Xin Dong Wei Wang MQ 66 259 0 28 Sep 2019
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks Ruihao Gong Xianglong Liu Shenghu Jiang Tian-Hao Li Peng Hu Jiazhen Lin F. Yu Junjie Yan MQ 79 459 0 14 Aug 2019
And the Bit Goes Down: Revisiting the Quantization of Neural Networks Pierre Stock Armand Joulin Rémi Gribonval Benjamin Graham Hervé Jégou MQ 104 149 0 12 Jul 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Mingxing Tan Quoc V. Le 3DV MedIm 194 18,224 0 28 May 2019
HAQ: Hardware-Aware Automated Quantization with Mixed Precision Kuan-Chieh Wang Zhijian Liu Chengyue Wu Ji Lin Song Han MQ 134 885 0 21 Nov 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks Mark Sandler Andrew G. Howard Menglong Zhu A. Zhmoginov Liang-Chieh Chen 234 19,353 0 13 Jan 2018
Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks Shuchang Zhou Yuzhi Wang He Wen Qinyao He Yuheng Zou MQ 96 110 0 22 Jun 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 1.2K 20,918 0 17 Apr 2017
On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition Rohit Prabhavalkar O. Alsharif A. Bruguier Ian McGraw 70 103 0 25 Mar 2016
Personalized Speech recognition on mobile devices Ian McGraw Rohit Prabhavalkar R. Álvarez Montse Gonzalez Arenas Kanishka Rao ... O. Alsharif Hasim Sak A. Gruenstein F. Beaufays Carolina Parada 103 184 0 10 Mar 2016
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Song Han Huizi Mao W. Dally 3DGS 263 8,864 0 01 Oct 2015
Deep Speech: Scaling up end-to-end speech recognition Awni Y. Hannun Carl Case Jared Casper Bryan Catanzaro G. Diamos ... R. Prenger S. Satheesh Shubho Sengupta Adam Coates A. Ng 195 2,128 0 17 Dec 2014