Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05877
Cited By
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
50 / 1,298 papers shown
Title
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks
Rajiv Movva
Jinhao Lei
Shayne Longpre
Ajay K. Gupta
Chris DuBois
VLM
MQ
69
5
0
20 Aug 2022
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
95
87
0
19 Aug 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
102
13
0
11 Aug 2022
Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment
Jie Zhu
Leye Wang
Xiao Han
87
10
0
11 Aug 2022
A Time-to-first-spike Coding and Conversion Aware Training for Energy-Efficient Deep Spiking Neural Network Processor Design
Dongwoo Lew
Kyungchul Lee
Jongsun Park
32
14
0
09 Aug 2022
Adaptive Edge Offloading for Image Classification Under Rate Limit
Jiaming Qiu
Ruiqi Wang
Ayan Chakrabarti
Roch Guérin
Chenyang Lu
OffRL
61
14
0
31 Jul 2022
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park
Yeongsang Jang
Eunhyeok Park
MQ
67
2
0
31 Jul 2022
Efficient NLP Model Finetuning via Multistage Data Filtering
Ouyang Xu
S. Ansari
F. Lin
Yangfeng Ji
74
4
0
28 Jul 2022
Reconciling Security and Communication Efficiency in Federated Learning
Karthik Prasad
Sayan Ghosh
Graham Cormode
Ilya Mironov
Ashkan Yousefpour
Pierre Stock
FedML
71
9
0
26 Jul 2022
Versatile Weight Attack via Flipping Limited Bits
Jiawang Bai
Baoyuan Wu
Zhifeng Li
Shutao Xia
AAML
71
20
0
25 Jul 2022
Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Xiaohuai Le
Tong Lei
Kai-Jyun Chen
Jing Lu
132
20
0
22 Jul 2022
Quantized Sparse Weight Decomposition for Neural Network Compression
Andrey Kuzmin
M. V. Baalen
Markus Nagel
Arash Behboodi
MQ
60
3
0
22 Jul 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Kan Wu
Jinnian Zhang
Houwen Peng
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
79
267
0
21 Jul 2022
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Jiseok Youn
Jaehun Song
Hyung-Sin Kim
S. Bahk
MQ
61
8
0
20 Jul 2022
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime
Saad Ashfaq
Mohammadhossein Askarihemmat
Sudhakar Sah
Ehsan Saboori
Olivier Mastropietro
Alexander Hoffman
BDL
MQ
38
5
0
18 Jul 2022
Low-bit Shift Network for End-to-End Spoken Language Understanding
Anderson R. Avila
Khalil Bibi
Ruizhi Yang
Xinlin Li
Chao Xing
Xiao Chen
MQ
102
4
0
15 Jul 2022
Low-Precision Arithmetic for Fast Gaussian Processes
Wesley J. Maddox
Andres Potapczynski
A. Wilson
54
12
0
14 Jul 2022
CEG4N: Counter-Example Guided Neural Network Quantization Refinement
J. Matos
I. Bessa
Edoardo Manino
Xidan Song
Lucas C. Cordeiro
MQ
63
2
0
09 Jul 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
139
106
0
04 Jul 2022
On-Device Training Under 256KB Memory
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Chuang Gan
Song Han
MQ
144
213
0
30 Jun 2022
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model Co-Exploration
A. Inci
Siri Garudanagiri Virupaksha
Aman Jain
Ting-Wu Chin
Venkata Vivek Thallam
Ruizhou Ding
Diana Marculescu
MQ
44
3
0
30 Jun 2022
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
61
3
0
30 Jun 2022
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model
Xudong Pan
Yifan Yan
Sheng Zhang
Mi Zhang
Min Yang
67
1
0
29 Jun 2022
QuantFace: Towards Lightweight Face Recognition by Synthetic Data Low-bit Quantization
Fadi Boutros
Naser Damer
Arjan Kuijper
CVBM
MQ
77
38
0
21 Jun 2022
Low-Precision Stochastic Gradient Langevin Dynamics
Ruqi Zhang
A. Wilson
Chris De Sa
BDL
65
14
0
20 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
99
85
0
20 Jun 2022
Fast Lossless Neural Compression with Integer-Only Discrete Flows
Siyu Wang
Jianfei Chen
Chongxuan Li
Jun Zhu
Bo Zhang
MQ
64
7
0
17 Jun 2022
Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes
Matteo Risso
Luca Bompani
Luca Benini
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
68
12
0
17 Jun 2022
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks
Clemens J. S. Schaefer
Siddharth Joshi
Shane Li
Raul Blazquez
MQ
78
10
0
15 Jun 2022
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
Alessandro Pappalardo
Yaman Umuroglu
Michaela Blott
Jovan Mitrevski
B. Hawks
...
J. Muhizi
Matthew Trahms
Shih-Chieh Hsu
Scott Hauck
Javier Mauricio Duarte
MQ
39
18
0
15 Jun 2022
Two-stage Human Activity Recognition on Microcontrollers with Decision Trees and CNNs
Francesco Daghero
Daniele Jahier Pagliari
Massimo Poncino
BDL
69
12
0
07 Jun 2022
Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge
M. Malka
Erez Farhan
Hai Morgenstern
Nir Shlezinger
FedML
74
13
0
07 Jun 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
104
11
0
07 Jun 2022
DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
Y. Fu
Haichuan Yang
Jiayi Yuan
Meng Li
Cheng Wan
Raghuraman Krishnamoorthi
Vikas Chandra
Yingyan Lin
130
19
0
02 Jun 2022
NIPQ: Noise proxy-based Integrated Pseudo-Quantization
Juncheol Shin
Junhyuk So
Sein Park
Seungyeop Kang
S. Yoo
Eunhyeok Park
53
28
0
02 Jun 2022
FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems
Ali Mokhtari
Md. Abir Hossen
Pooyan Jamshidi
M. Salehi
57
9
0
31 May 2022
Machine Learning for Microcontroller-Class Hardware: A Review
Swapnil Sayan Saha
S. Sandha
Mani B. Srivastava
109
125
0
29 May 2022
Adaptive Random Forests for Energy-Efficient Inference on Microcontrollers
Francesco Daghero
Luca Bompani
Chen Xie
Luca Benini
A. Calimera
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
59
8
0
27 May 2022
Ultra-compact Binary Neural Networks for Human Activity Recognition on RISC-V Processors
Francesco Daghero
Chenhao Xie
Daniele Jahier Pagliari
Luca Bompani
Marco Castellano
Luca Gandolfi
A. Calimera
Enrico Macii
Massimo Poncino
BDL
MQ
93
13
0
25 May 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
116
20
0
25 May 2022
Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression
M. Deutel
Philipp Woller
Christopher Mutschler
Jürgen Teich
113
4
0
20 May 2022
Fast matrix multiplication for binary and ternary CNNs on ARM CPU
A. Trusov
E. Limonova
D. Nikolaev
V. Arlazarov
MQ
57
5
0
18 May 2022
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Paul Wimmer
Jens Mehnert
Alexandru Paul Condurache
DD
98
21
0
17 May 2022
On Algebraic Constructions of Neural Networks with Small Weights
Kordag Mehmet Kilic
Jin Sima
J. Bruck
29
2
0
17 May 2022
ImageSig: A signature transform for ultra-lightweight image recognition
Mohamed Ramzy Ibrahim
Terry Lyons
VLM
141
7
0
13 May 2022
A Framework for Event-based Computer Vision on a Mobile Device
Gregor Lenz
S. Picaud
S. Ieng
FedML
55
1
0
13 May 2022
Fast Conditional Network Compression Using Bayesian HyperNetworks
Phuoc Nguyen
T. Tran
Ky Le
Sunil R. Gupta
Santu Rana
Dang Nguyen
Trong Nguyen
S. Ryan
Svetha Venkatesh
BDL
54
7
0
13 May 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
53
7
0
12 May 2022
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu
Matthew Lentz
Danyang Zhuo
Yao Lu
84
24
0
10 May 2022
A Survey on AI Sustainability: Emerging Trends on Learning Algorithms and Research Challenges
Zhenghua Chen
Min-man Wu
Alvin Chan
Xiaoli Li
Yew-Soon Ong
54
7
0
08 May 2022
Previous
1
2
3
...
13
14
15
...
24
25
26
Next