Quantizing deep convolutional networks for efficient inference: A whitepaper

21 June 2018

Papers citing "Quantizing deep convolutional networks for efficient inference: A whitepaper"

35 / 35 papers shown

Title
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs Shmulik Markovich-Golan Daniel Ohayon Itay Niv Yair Hanani MQ 83 0 0 19 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques Sanjay Surendranath Girija Shashank Kapoor Lakshit Arora Dipen Pradhan Aman Raj Ankit Shetgaonkar 75 0 0 05 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models Yeona Hong Hyewon Han Woo-Jin Chung Hong-Goo Kang MQ 73 0 0 21 Apr 2025
Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats Angela Cratere M. Salim Farissi Andrea Carbone Marcello Asciolla Maria Rizzi Francesco DellÓlio Augusto Nascetti Dario Spiller 127 1 0 04 Apr 2025
A 71.2- $μ$ W Speech Recognition Accelerator with Recurrent Spiking Neural Network Chih-Chyau Yang Tian-Sheuan Chang 97 1 0 27 Mar 2025
SpinQuant: LLM quantization with learned rotations Zechun Liu Changsheng Zhao Igor Fedorov Bilge Soran Dhruv Choudhary Raghuraman Krishnamoorthi Vikas Chandra Yuandong Tian Tijmen Blankevoort MQ 167 105 0 21 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Zechun Liu Changsheng Zhao Hanxian Huang Sijia Chen Jing Zhang ... Yuandong Tian Bilge Soran Raghuraman Krishnamoorthi Tijmen Blankevoort Vikas Chandra MQ 105 7 0 04 Feb 2025
HyperCam: Low-Power Onboard Computer Vision for IoT Cameras Chae Young Lee Maxwell Fite Tejus Rao Sara Achour Zerina Kapetanovic VLM 60 1 0 17 Jan 2025
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers Mingliang Xu Yuyao Zhou Yuxin Zhang Shen Li Yong Li Chia-Wen Lin Zhanpeng Zeng Rongrong Ji MQ 209 0 0 31 Dec 2024
Data Generation for Hardware-Friendly Post-Training Quantization Lior Dikstein Ariel Lapid Arnon Netzer H. Habi MQ 388 0 0 29 Oct 2024
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision Yufeng Yang Adrian Kneip Charlotte Frenkel GNN 64 4 0 30 Apr 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance Jaskirat Singh Emad Fallahzadeh Bram Adams Ahmed E. Hassan MQ 71 3 0 25 Mar 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 151 18 0 28 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing Dimitrios Danopoulos Georgios Zervakis Dimitrios Soudris Jörg Henkel ViT 66 2 0 12 Feb 2024
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices Jianlei Yang Jiacheng Liao Fanding Lei Meichen Liu Junyi Chen Lingkun Long Han Wan Bei Yu Weisheng Zhao MoE 69 2 0 03 Nov 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications Vasileios Leon Muhammad Abdullah Hanif Giorgos Armeniakos Xun Jiao Mohamed Bennai K. Pekmestzi Dimitrios Soudris 62 3 0 20 Jul 2023
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming Itay Hubara Yury Nahshan Y. Hanani Ron Banner Daniel Soudry MQ 74 124 0 14 Jun 2020
Deep Learning at the Edge Sahar Voghoei N. Tonekaboni Jason G. Wallace H. Arabnia 95 41 0 22 Oct 2019
A Quantization-Friendly Separable Convolution for MobileNets Tao Sheng Chen Feng Shaojie Zhuo Xiaopeng Zhang Liang Shen M. Aleksic MQ 33 112 0 22 Mar 2018
Model compression via distillation and quantization A. Polino Razvan Pascanu Dan Alistarh MQ 69 722 0 15 Feb 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Benoit Jacob S. Kligys Bo Chen Menglong Zhu Matthew Tang Andrew G. Howard Hartwig Adam Dmitry Kalenichenko MQ 126 3,090 0 15 Dec 2017
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy Asit K. Mishra Debbie Marr FedML 63 330 0 15 Nov 2017
WRPN: Wide Reduced-Precision Networks Asit K. Mishra Eriko Nurvitadhi Jeffrey J. Cook Debbie Marr MQ 62 267 0 04 Sep 2017
Learning Transferable Architectures for Scalable Image Recognition Barret Zoph Vijay Vasudevan Jonathon Shlens Quoc V. Le 148 5,577 0 21 Jul 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 1.1K 20,747 0 17 Apr 2017
Efficient Processing of Deep Neural Networks: A Tutorial and Survey Vivienne Sze Yu-hsin Chen Tien-Ju Yang J. Emer AAML 3DV 96 3,002 0 27 Mar 2017
Identity Mappings in Deep Residual Networks Kaiming He Xinming Zhang Shaoqing Ren Jian Sun 300 10,149 0 16 Mar 2016
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size F. Iandola Song Han Matthew W. Moskewicz Khalid Ashraf W. Dally Kurt Keutzer 130 7,448 0 24 Feb 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network Song Han Xingyu Liu Huizi Mao Jing Pu A. Pedram M. Horowitz W. Dally 110 2,453 0 04 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.5K 192,638 0 10 Dec 2015
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 541 27,231 0 02 Dec 2015
BinaryConnect: Training Deep Neural Networks with binary weights during propagations Matthieu Courbariaux Yoshua Bengio J. David MQ 156 2,976 0 02 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Song Han Huizi Mao W. Dally 3DGS 212 8,793 0 01 Oct 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 273 19,523 0 09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe Christian Szegedy OOD 365 43,154 0 11 Feb 2015