Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.09602
Cited By
Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
20 April 2020
Hao Wu
Patrick Judd
Xiaojie Zhang
Mikhail Isaev
Paulius Micikevicius
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation"
50 / 55 papers shown
Title
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
70
0
0
08 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
33
0
0
21 Apr 2025
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
Yingsong Luo
Ling Chen
MQ
23
0
0
16 Oct 2024
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
35
4
0
25 Sep 2024
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffM
MQ
49
0
0
28 Jul 2024
Quantizing YOLOv7: A Comprehensive Study
Mohammadamin Baghbanbashi
Mohsen Raji
B. Ghavami
MQ
34
8
0
06 Jul 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
40
8
0
28 May 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
45
3
0
25 Mar 2024
Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment
Atah Nuh Mih
Alireza Rahimi
Asfia Kawnine
Francis Palma
Monica Wachowicz
R. Dubay
Hung Cao
31
0
0
14 Mar 2024
A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data Transmission
Wenjun Huang
Arghavan Rezvani
Hanning Chen
Yang Ni
Sanggeon Yun
Sungheon Jeong
Mohsen Imani
32
7
0
03 Feb 2024
Knowledge Translation: A New Pathway for Model Compression
Wujie Sun
Defang Chen
Jiawei Chen
Yan Feng
Chun-Yen Chen
Can Wang
31
0
0
11 Jan 2024
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Jiayi Pan
Chengcan Wang
Kaifu Zheng
Yangguang Li
Zhenyu Wang
Bin Feng
MQ
43
7
0
06 Dec 2023
Λ
Λ
Λ
-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI
Shoki Ohta
Takayuki Nishio
75
4
0
23 Oct 2023
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Lakshmi Nair
Mikhail Bernadskiy
Arulselvan Madhavan
Craig Chan
Ayon Basumallik
D. Bunandar
MQ
46
2
0
07 Jul 2023
A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact
Álvaro Huertas-García
Carlos Martí-González
Rubén García Maezo
Alejandro Echeverría Rey
35
3
0
01 Jul 2023
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Chong Yu
Tao Chen
Zhongxue Gan
Jiayuan Fan
MQ
ViT
33
23
0
18 May 2023
Evil from Within: Machine Learning Backdoors through Hardware Trojans
Alexander Warnecke
Julian Speith
Janka Möller
Konrad Rieck
C. Paar
AAML
29
3
0
17 Apr 2023
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
37
1
0
24 Mar 2023
A High-Performance Accelerator for Super-Resolution Processing on Embedded GPU
W. Zhao
Qi Sun
Yang Bai
Wenbo Li
Haisheng Zheng
Bei Yu
Martin D. F. Wong
SupR
47
8
0
16 Mar 2023
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
23
1
0
03 Mar 2023
Mixed Precision Post Training Quantization of Neural Networks with Sensitivity Guided Search
Clemens J. S. Schaefer
Elfie Guo
Caitlin Stanton
Xiaofan Zhang
T. Jablin
Navid Lambert-Shirzad
Jian Li
Chia-Wei Chou
Siddharth Joshi
Yu Wang
MQ
31
3
0
02 Feb 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
218
0
19 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
34
6
0
05 Dec 2022
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
31
5
0
27 Oct 2022
TPU-MLIR: A Compiler For TPU Using MLIR
Pengchao Hu
Man Lu
Lei Wang
Guoyue Jiang
22
5
0
23 Oct 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
40
147
0
27 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
Mohammad Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
77
126
0
12 Sep 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
30
11
0
11 Aug 2022
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park
Yeongsang Jang
Eunhyeok Park
MQ
26
2
0
31 Jul 2022
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
Benjamin Fuhrer
Yuval Shpigelman
Chen Tessler
Shie Mannor
Gal Chechik
E. Zahavi
Gal Dalal
33
4
0
05 Jul 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
57
96
0
04 Jul 2022
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
I. Ahmed
Sahil Parmar
Matthew Boyd
Michael Beidler
Kris Kang
Bill Liu
Kyle Roach
John Kim
D. Abts
LLMAG
20
6
0
22 Jun 2022
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training
Charbel Sakr
Steve Dai
Rangharajan Venkatesan
B. Zimmer
W. Dally
Brucek Khailany
MQ
27
41
0
13 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
73
448
0
04 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
46
9
0
22 May 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
19
6
0
12 May 2022
Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware
B. Sudharsan
Dineshkumar Sundaram
Pankesh Patel
J. Breslin
M. Ali
Schahram Dustdar
Albert Zomaya
R. Ranjan
23
2
0
20 Apr 2022
ICSML: Industrial Control Systems ML Framework for native inference using IEC 61131-3 code
Constantine Doumanidis
Prashant Hari Narayan Rajput
Michail Maniatakos
25
2
0
21 Feb 2022
Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment
Jemin Lee
Misun Yu
Yongin Kwon
Teaho Kim
MQ
30
17
0
10 Feb 2022
Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations
Xinyu Zhang
Ian Colbert
Ken Kreutz-Delgado
Srinjoy Das
MQ
37
11
0
15 Oct 2021
Shifting Capsule Networks from the Cloud to the Deep Edge
Miguel Costa
Diogo Costa
T. Gomes
Sandro Pinto
34
5
0
06 Oct 2021
4-bit Quantization of LSTM-based Speech Recognition Models
A. Fasoli
Chia-Yu Chen
Mauricio Serrano
Xiao Sun
Naigang Wang
...
Xiaodong Cui
Brian Kingsbury
Wei Zhang
Zoltán Tüske
K. Gopalakrishnan
MQ
26
21
0
27 Aug 2021
Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization
Lukas Stäcker
Juncong Fei
Philipp Heidenreich
Frank Bonarens
J. Rambach
D. Stricker
Christoph Stiller
24
24
0
18 Aug 2021
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
43
20
0
30 Jun 2021
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Jiawei Zhao
Steve Dai
Rangharajan Venkatesan
Brian Zimmer
Mustafa Ali
Xuan Li
Brucek Khailany
B. Dally
Anima Anandkumar
MQ
39
13
0
26 Jun 2021
Knowledge distillation: A good teacher is patient and consistent
Lucas Beyer
Xiaohua Zhai
Amelie Royer
L. Markeeva
Rohan Anil
Alexander Kolesnikov
VLM
52
287
0
09 Jun 2021
Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case
F. Rehm
S. Vallecorsa
V. Saletore
Hans Pabst
Adel Chaibi
V. Codreanu
Kerstin Borras
D. Krücker
MQ
19
16
0
18 Mar 2021
Reinforcement Learning for Datacenter Congestion Control
Chen Tessler
Yuval Shpigelman
Gal Dalal
Amit Mandelbaum
Doron Haritan Kazakov
Benjamin Fuhrer
Gal Chechik
Shie Mannor
37
32
0
18 Feb 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
33
68
0
08 Feb 2021
1
2
Next