Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.08342
Cited By
Quantizing deep convolutional networks for efficient inference: A whitepaper
21 June 2018
Raghuraman Krishnamoorthi
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantizing deep convolutional networks for efficient inference: A whitepaper"
35 / 35 papers shown
Title
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs
Shmulik Markovich-Golan
Daniel Ohayon
Itay Niv
Yair Hanani
MQ
83
0
0
19 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
75
0
0
05 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
73
0
0
21 Apr 2025
Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats
Angela Cratere
M. Salim Farissi
Andrea Carbone
Marcello Asciolla
Maria Rizzi
Francesco DellÓlio
Augusto Nascetti
Dario Spiller
127
1
0
04 Apr 2025
A 71.2-
μ
μ
μ
W Speech Recognition Accelerator with Recurrent Spiking Neural Network
Chih-Chyau Yang
Tian-Sheuan Chang
97
1
0
27 Mar 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
167
105
0
21 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
105
7
0
04 Feb 2025
HyperCam: Low-Power Onboard Computer Vision for IoT Cameras
Chae Young Lee
Maxwell Fite
Tejus Rao
Sara Achour
Zerina Kapetanovic
VLM
60
1
0
17 Jan 2025
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Mingliang Xu
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Chia-Wen Lin
Zhanpeng Zeng
Rongrong Ji
MQ
209
0
0
31 Dec 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
388
0
0
29 Oct 2024
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
Yufeng Yang
Adrian Kneip
Charlotte Frenkel
GNN
64
4
0
30 Apr 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
71
3
0
25 Mar 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
151
18
0
28 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
66
2
0
12 Feb 2024
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Jianlei Yang
Jiacheng Liao
Fanding Lei
Meichen Liu
Junyi Chen
Lingkun Long
Han Wan
Bei Yu
Weisheng Zhao
MoE
69
2
0
03 Nov 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Vasileios Leon
Muhammad Abdullah Hanif
Giorgos Armeniakos
Xun Jiao
Mohamed Bennai
K. Pekmestzi
Dimitrios Soudris
62
3
0
20 Jul 2023
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
Itay Hubara
Yury Nahshan
Y. Hanani
Ron Banner
Daniel Soudry
MQ
74
124
0
14 Jun 2020
Deep Learning at the Edge
Sahar Voghoei
N. Tonekaboni
Jason G. Wallace
H. Arabnia
95
41
0
22 Oct 2019
A Quantization-Friendly Separable Convolution for MobileNets
Tao Sheng
Chen Feng
Shaojie Zhuo
Xiaopeng Zhang
Liang Shen
M. Aleksic
MQ
33
112
0
22 Mar 2018
Model compression via distillation and quantization
A. Polino
Razvan Pascanu
Dan Alistarh
MQ
69
722
0
15 Feb 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
126
3,090
0
15 Dec 2017
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
Asit K. Mishra
Debbie Marr
FedML
63
330
0
15 Nov 2017
WRPN: Wide Reduced-Precision Networks
Asit K. Mishra
Eriko Nurvitadhi
Jeffrey J. Cook
Debbie Marr
MQ
62
267
0
04 Sep 2017
Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph
Vijay Vasudevan
Jonathon Shlens
Quoc V. Le
148
5,577
0
21 Jul 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.1K
20,747
0
17 Apr 2017
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Vivienne Sze
Yu-hsin Chen
Tien-Ju Yang
J. Emer
AAML
3DV
96
3,002
0
27 Mar 2017
Identity Mappings in Deep Residual Networks
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
300
10,149
0
16 Mar 2016
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
F. Iandola
Song Han
Matthew W. Moskewicz
Khalid Ashraf
W. Dally
Kurt Keutzer
130
7,448
0
24 Feb 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han
Xingyu Liu
Huizi Mao
Jing Pu
A. Pedram
M. Horowitz
W. Dally
110
2,453
0
04 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.5K
192,638
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
541
27,231
0
02 Dec 2015
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
Matthieu Courbariaux
Yoshua Bengio
J. David
MQ
156
2,976
0
02 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
212
8,793
0
01 Oct 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
273
19,523
0
09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
365
43,154
0
11 Feb 2015
1