Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.08295
Cited By
A White Paper on Neural Network Quantization
15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A White Paper on Neural Network Quantization"
47 / 247 papers shown
Title
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
19
4
0
31 Jan 2023
BOMP-NAS: Bayesian Optimization Mixed Precision NAS
David van Son
F. D. Putter
Sebastian Vogel
Henk Corporaal
MQ
27
3
0
27 Jan 2023
Optimized learned entropy coding parameters for practical neural-based image and video compression
A. Said
Reza Pourreza
H. Le
MQ
35
2
0
20 Jan 2023
Person Detection Using an Ultra Low-resolution Thermal Imager on a Low-cost MCU
Maarten Vandersteegen
Wouter Reusen
Kristof Van Beeck
Toon Goedemé
23
2
0
16 Dec 2022
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
96
68
0
14 Dec 2022
QVIP: An ILP-based Formal Verification Approach for Quantized Neural Networks
Yedi Zhang
Zhe Zhao
Fu Song
Mengdi Zhang
Tao Chen
Jun Sun
36
17
0
10 Dec 2022
QEBVerif: Quantization Error Bound Verification of Neural Networks
Yedi Zhang
Fu Song
Jun Sun
MQ
26
11
0
06 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
26
6
0
05 Dec 2022
Device Interoperability for Learned Image Compression with Weights and Activations Quantization
Esin Koyuncu
T. Solovyev
Elena Alshina
Andre Kaup
21
10
0
02 Dec 2022
Post-training Quantization on Diffusion Models
Yuzhang Shang
Zhihang Yuan
Bin Xie
Bingzhe Wu
Yan Yan
DiffM
MQ
15
159
0
28 Nov 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks
Louis Leconte
S. Schechtman
Eric Moulines
29
4
0
07 Nov 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
24
886
0
31 Oct 2022
Neural Networks with Quantization Constraints
Ignacio Hounie
Juan Elenter
Alejandro Ribeiro
MQ
21
4
0
27 Oct 2022
Desiderata for next generation of ML model serving
Sherif Akoush
Andrei Paleyes
A. V. Looveren
Clive Cox
33
5
0
26 Oct 2022
Knowledge Distillation approach towards Melanoma Detection
Md Shakib Khan
Kazi Nabiul Alam
Abdur Rab Dhruba
H. Zunair
Nabeel Mohammed
32
23
0
14 Oct 2022
Inference Latency Prediction at the Edge
Zhuojin Li
Marco Paolieri
L. Golubchik
27
3
0
06 Oct 2022
SAMP: A Model Inference Toolkit of Post-Training Quantization for Text Processing via Self-Adaptive Mixed-Precision
Rong Tian
Zijing Zhao
Weijie Liu
Haoyan Liu
Weiquan Mao
Zhe Zhao
Kimmo Yan
MQ
19
5
0
19 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
20
55
0
30 Aug 2022
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Elias Frantar
Sidak Pal Singh
Dan Alistarh
MQ
25
216
0
24 Aug 2022
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey
Dalin Zhang
Kaixuan Chen
Yan Zhao
B. Yang
Li-Ping Yao
Christian S. Jensen
46
3
0
22 Aug 2022
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
25
78
0
19 Aug 2022
Boosting neural video codecs by exploiting hierarchical redundancy
Reza Pourreza
H. Le
A. Said
Guillaume Sautière
Auke Wiggers
23
13
0
08 Aug 2022
Quantized Sparse Weight Decomposition for Neural Network Compression
Andrey Kuzmin
M. V. Baalen
Markus Nagel
Arash Behboodi
MQ
14
3
0
22 Jul 2022
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime
Saad Ashfaq
Mohammadhossein Askarihemmat
Sudhakar Sah
Ehsan Saboori
Olivier Mastropietro
Alexander Hoffman
BDL
MQ
21
4
0
18 Jul 2022
MobileCodec: Neural Inter-frame Video Compression on Mobile Devices
H. Le
Liang Zhang
A. Said
Guillaume Sautière
Yang Yang
Pranav Shrestha
Fei Yin
Reza Pourreza
Auke Wiggers
28
30
0
18 Jul 2022
Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices
Kartik Gupta
Marios Fournarakis
M. Reisser
Christos Louizos
Markus Nagel
FedML
19
14
0
22 Jun 2022
Wavelet Feature Maps Compression for Image-to-Image CNNs
Shahaf E. Finder
Yair Zohav
Maor Ashkenazi
Eran Treister
19
17
0
24 May 2022
RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization
Hongyi Yao
Pu Li
Jian Cao
Xiangcheng Liu
Chenying Xie
Bin Wang
MQ
23
12
0
26 Apr 2022
Vision Transformer Compression with Structured Pruning and Low Rank Approximation
Ankur Kumar
ViT
26
6
0
25 Mar 2022
Overcoming Oscillations in Quantization-Aware Training
Markus Nagel
Marios Fournarakis
Yelysei Bondarenko
Tijmen Blankevoort
MQ
111
101
0
21 Mar 2022
TinyMLOps: Operational Challenges for Widespread Edge AI Adoption
Sam Leroux
Pieter Simoens
Meelis Lootus
Kartik Thakore
Akshay Sharma
24
16
0
21 Mar 2022
An Empirical Study of Low Precision Quantization for TinyML
Shaojie Zhuo
Hongyu Chen
R. Ramakrishnan
Tommy Chen
Chen Feng
Yi-Rung Lin
Parker Zhang
Liang Shen
MQ
32
13
0
10 Mar 2022
Post-Training Quantization for Cross-Platform Learned Image Compression
Dailan He
Zi Yang
Yuan-Hsin Chen
Qi Zhang
Hongwei Qin
Yan Wang
MQ
42
13
0
15 Feb 2022
F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Qing Jin
Jian Ren
Richard Zhuang
Sumant Hanumante
Zhengang Li
Zhiyu Chen
Yanzhi Wang
Kai-Min Yang
Sergey Tulyakov
MQ
24
48
0
10 Feb 2022
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
Georgii Sergeevich Novikov
Daniel Bershatsky
Julia Gusak
Alex Shonenkov
Denis Dimitrov
Ivan V. Oseledets
MQ
26
17
0
01 Feb 2022
SPDY: Accurate Pruning with Speedup Guarantees
Elias Frantar
Dan Alistarh
39
33
0
31 Jan 2022
Implicit Neural Video Compression
Yunfan Zhang
T. V. Rozendaal
Johann Brehmer
Markus Nagel
Taco S. Cohen
49
57
0
21 Dec 2021
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
Brian Chmiel
Ron Banner
Elad Hoffer
Hilla Ben Yaacov
Daniel Soudry
MQ
30
22
0
19 Dec 2021
Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set
T. V. Rozendaal
Johann Brehmer
Yunfan Zhang
Reza Pourreza
Auke Wiggers
Taco S. Cohen
39
24
0
19 Nov 2021
An Underexplored Dilemma between Confidence and Calibration in Quantized Neural Networks
Guoxuan Xia
Sangwon Ha
Tiago Azevedo
Partha P. Maji
UQCV
17
1
0
10 Nov 2021
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
25
133
0
27 Sep 2021
HPTQ: Hardware-Friendly Post Training Quantization
H. Habi
Reuven Peretz
Elad Cohen
Lior Dikstein
Oranit Dror
I. Diamant
Roy H. Jennings
Arnon Netzer
MQ
31
8
0
19 Sep 2021
DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks
Chee Hong
Heewon Kim
Sungyong Baik
Junghun Oh
Kyoung Mu Lee
OOD
SupR
MQ
21
41
0
21 Dec 2020
Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces
Bert Moons
Parham Noorzad
Andrii Skliar
G. Mariani
Dushyant Mehta
Chris Lott
Tijmen Blankevoort
145
43
0
16 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,567
0
17 Apr 2017
Previous
1
2
3
4
5