ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08295
  4. Cited By
A White Paper on Neural Network Quantization

A White Paper on Neural Network Quantization

15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
    MQ
ArXiv (abs)PDFHTML

Papers citing "A White Paper on Neural Network Quantization"

50 / 264 papers shown
Title
FlexRound: Learnable Rounding based on Element-wise Division for
  Post-Training Quantization
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
J. H. Lee
Jeonghoon Kim
S. Kwon
Dongsoo Lee
MQ
110
38
0
01 Jun 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at Scale
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
121
38
0
30 May 2023
Binary stochasticity enabled highly efficient neuromorphic deep learning
  achieves better-than-software accuracy
Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracy
Yang Li
Wei Wang
Ming Wang
C. Dou
Zhengyu Ma
...
Guanhua Yang
Feng Zhang
Ling Li
Daniele Ielmini
Ming-Yuan Liu
27
5
0
25 Apr 2023
Improving Post-Training Quantization on Object Detection with Task
  Loss-Guided Lp Metric
Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric
Lin Niu
Jia-Wen Liu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
61
2
0
19 Apr 2023
Arrhythmia Classifier Based on Ultra-Lightweight Binary Neural Network
Arrhythmia Classifier Based on Ultra-Lightweight Binary Neural Network
Ninghao Pu
Zhong-Li Wu
Ao Wang
Hanshi Sun
Zijing Liu
Hao Liu
MQ
36
7
0
04 Apr 2023
FP8 versus INT8 for efficient deep learning inference
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
75
48
0
31 Mar 2023
Tetra-AML: Automatic Machine Learning via Tensor Networks
Tetra-AML: Automatic Machine Learning via Tensor Networks
A. Naumov
Ar. Melnikov
V. Abronin
F. Oxanichenko
K. Izmailov
M. Pflitsch
A. Melnikov
M. Perelshtein
61
11
0
28 Mar 2023
Benchmarking the Reliability of Post-training Quantization: a Particular
  Focus on Worst-case Performance
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Zhihang Yuan
Jiawei Liu
Jiaxiang Wu
Dawei Yang
Qiang Wu
Guangyu Sun
Wenyu Liu
Xinggang Wang
Bingzhe Wu
MQ
74
7
0
23 Mar 2023
Low Rank Optimization for Efficient Deep Learning: Making A Balance
  between Compact Architecture and Fast Training
Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training
Xinwei Ou
Zhangxin Chen
Ce Zhu
Yipeng Liu
79
5
0
22 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Unit Scaling: Out-of-the-Box Low-Precision Training
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
64
7
0
20 Mar 2023
MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video
MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video
Renat Bashirov
A. Larionov
E. Ustinova
Mikhail Sidorenko
D. Svitov
Ilya Zakharkin
Victor Lempitsky
3DH
88
3
0
17 Mar 2023
Operating critical machine learning models in resource constrained
  regimes
Operating critical machine learning models in resource constrained regimes
Raghavendra Selvan
Julian Schon
Erik Dam
MedIm
85
8
0
17 Mar 2023
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster
  Inference on Mobile Platforms
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms
Guillaume Berger
Manik Dhingra
Antoine Mercier
Yash Savani
Sunny Panchal
Fatih Porikli
SupR
55
5
0
08 Mar 2023
RQAT-INR: Improved Implicit Neural Image Compression
RQAT-INR: Improved Implicit Neural Image Compression
B. Damodaran
M. Balcilar
Franck Galpin
Pierre Hellier
43
9
0
06 Mar 2023
Hierarchical Training of Deep Neural Networks Using Early Exiting
Hierarchical Training of Deep Neural Networks Using Early Exiting
Yamin Sepehri
P. Pad
A. C. Yüzügüler
P. Frossard
L. A. Dunbar
81
9
0
04 Mar 2023
Hardware-aware training for large-scale and diverse deep learning
  inference workloads using in-memory computing-based accelerators
Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators
Malte J. Rasch
C. Mackin
Manuel Le Gallo
An Chen
A. Fasoli
...
P. Narayanan
H. Tsai
G. Burr
Abu Sebastian
Vijay Narayanan
69
96
0
16 Feb 2023
Towards Optimal Compression: Joint Pruning and Quantization
Towards Optimal Compression: Joint Pruning and Quantization
Ben Zandonati
Glenn Bucagu
Adrian Alan Pol
M. Pierini
Olya Sirkin
Tal Kopetz
MQ
81
3
0
15 Feb 2023
A Practical Mixed Precision Algorithm for Post-Training Quantization
A Practical Mixed Precision Algorithm for Post-Training Quantization
N. Pandey
Markus Nagel
M. V. Baalen
Yin-Ruey Huang
Chirag I. Patel
Tijmen Blankevoort
MQ
64
22
0
10 Feb 2023
Q-Diffusion: Quantizing Diffusion Models
Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li
Yijia Liu
Long Lian
Hua Yang
Zhen Dong
Daniel Kang
Shanghang Zhang
Kurt Keutzer
DiffMMQ
162
177
0
08 Feb 2023
Training with Mixed-Precision Floating-Point Assignments
Training with Mixed-Precision Floating-Point Assignments
Wonyeol Lee
Rahul Sharma
A. Aiken
MQ
34
3
0
31 Jan 2023
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
  Overflow Avoidance
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
31
4
0
31 Jan 2023
BOMP-NAS: Bayesian Optimization Mixed Precision NAS
BOMP-NAS: Bayesian Optimization Mixed Precision NAS
David van Son
F. D. Putter
Sebastian Vogel
Henk Corporaal
MQ
61
3
0
27 Jan 2023
Optimized learned entropy coding parameters for practical neural-based
  image and video compression
Optimized learned entropy coding parameters for practical neural-based image and video compression
A. Said
Reza Pourreza
H. Le
MQ
40
2
0
20 Jan 2023
Person Detection Using an Ultra Low-resolution Thermal Imager on a
  Low-cost MCU
Person Detection Using an Ultra Low-resolution Thermal Imager on a Low-cost MCU
Maarten Vandersteegen
Wouter Reusen
Kristof Van Beeck
Toon Goedemé
40
2
0
16 Dec 2022
PD-Quant: Post-Training Quantization based on Prediction Difference
  Metric
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
185
71
0
14 Dec 2022
QVIP: An ILP-based Formal Verification Approach for Quantized Neural
  Networks
QVIP: An ILP-based Formal Verification Approach for Quantized Neural Networks
Yedi Zhang
Zhe Zhao
Fu Song
Hao Fei
Tao Chen
Jun Sun
69
18
0
10 Dec 2022
QEBVerif: Quantization Error Bound Verification of Neural Networks
QEBVerif: Quantization Error Bound Verification of Neural Networks
Yedi Zhang
Fu Song
Jun Sun
MQ
99
12
0
06 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees
  of freedom
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
74
7
0
05 Dec 2022
Device Interoperability for Learned Image Compression with Weights and
  Activations Quantization
Device Interoperability for Learned Image Compression with Weights and Activations Quantization
Esin Koyuncu
T. Solovyev
Elena Alshina
Andre Kaup
58
10
0
02 Dec 2022
Post-training Quantization on Diffusion Models
Post-training Quantization on Diffusion Models
Yuzhang Shang
Zhihang Yuan
Bin Xie
Bingzhe Wu
Yan Yan
DiffMMQ
151
182
0
28 Nov 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train
  Quantized Neural Networks
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks
Louis Leconte
S. Schechtman
Eric Moulines
83
4
0
07 Nov 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
  Transformers
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
191
1,013
0
31 Oct 2022
Neural Networks with Quantization Constraints
Neural Networks with Quantization Constraints
Ignacio Hounie
Juan Elenter
Alejandro Ribeiro
MQ
41
5
0
27 Oct 2022
Desiderata for next generation of ML model serving
Desiderata for next generation of ML model serving
Sherif Akoush
Andrei Paleyes
A. V. Looveren
Clive Cox
79
6
0
26 Oct 2022
Knowledge Distillation approach towards Melanoma Detection
Knowledge Distillation approach towards Melanoma Detection
Md Shakib Khan
Kazi Nabiul Alam
Abdur Rab Dhruba
H. Zunair
Nabeel Mohammed
65
24
0
14 Oct 2022
Inference Latency Prediction at the Edge
Inference Latency Prediction at the Edge
Zhuojin Li
Marco Paolieri
L. Golubchik
56
3
0
06 Oct 2022
SAMP: A Model Inference Toolkit of Post-Training Quantization for Text
  Processing via Self-Adaptive Mixed-Precision
SAMP: A Model Inference Toolkit of Post-Training Quantization for Text Processing via Self-Adaptive Mixed-Precision
Rong Tian
Zijing Zhao
Weijie Liu
Haoyan Liu
Weiquan Mao
Zhe Zhao
Kimmo Yan
MQ
50
5
0
19 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
160
32
0
14 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
  Network Quantization
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
83
60
0
30 Aug 2022
Optimal Brain Compression: A Framework for Accurate Post-Training
  Quantization and Pruning
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Elias Frantar
Sidak Pal Singh
Dan Alistarh
MQ
141
245
0
24 Aug 2022
Design Automation for Fast, Lightweight, and Effective Deep Learning
  Models: A Survey
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey
Dalin Zhang
Kaixuan Chen
Yan Zhao
B. Yang
Li-Ping Yao
Christian S. Jensen
118
3
0
22 Aug 2022
FP8 Quantization: The Power of the Exponent
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
88
87
0
19 Aug 2022
Boosting neural video codecs by exploiting hierarchical redundancy
Boosting neural video codecs by exploiting hierarchical redundancy
Reza Pourreza
H. Le
A. Said
Guillaume Sautière
Auke Wiggers
53
14
0
08 Aug 2022
Quantized Sparse Weight Decomposition for Neural Network Compression
Quantized Sparse Weight Decomposition for Neural Network Compression
Andrey Kuzmin
M. V. Baalen
Markus Nagel
Arash Behboodi
MQ
52
3
0
22 Jul 2022
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low
  Bit Quantization and Runtime
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime
Saad Ashfaq
Mohammadhossein Askarihemmat
Sudhakar Sah
Ehsan Saboori
Olivier Mastropietro
Alexander Hoffman
BDLMQ
31
5
0
18 Jul 2022
MobileCodec: Neural Inter-frame Video Compression on Mobile Devices
MobileCodec: Neural Inter-frame Video Compression on Mobile Devices
H. Le
Liang Zhang
A. Said
Guillaume Sautière
Yang Yang
Pranav Shrestha
Fei Yin
Reza Pourreza
Auke Wiggers
62
31
0
18 Jul 2022
Quantization Robust Federated Learning for Efficient Inference on
  Heterogeneous Devices
Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices
Kartik Gupta
Marios Fournarakis
M. Reisser
Christos Louizos
Markus Nagel
FedML
69
16
0
22 Jun 2022
Wavelet Feature Maps Compression for Image-to-Image CNNs
Wavelet Feature Maps Compression for Image-to-Image CNNs
Shahaf E. Finder
Yair Zohav
Maor Ashkenazi
Eran Treister
114
22
0
24 May 2022
RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training
  Quantization
RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization
Hongyi Yao
Pu Li
Jian Cao
Xiangcheng Liu
Chenying Xie
Bin Wang
MQ
95
12
0
26 Apr 2022
Vision Transformer Compression with Structured Pruning and Low Rank
  Approximation
Vision Transformer Compression with Structured Pruning and Low Rank Approximation
Ankur Kumar
ViT
38
6
0
25 Mar 2022
Previous
123456
Next