ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXiv (abs)PDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,298 papers shown
Title
Combining Compressions for Multiplicative Size Scaling on Natural
  Language Tasks
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks
Rajiv Movva
Jinhao Lei
Shayne Longpre
Ajay K. Gupta
Chris DuBois
VLMMQ
69
5
0
20 Aug 2022
FP8 Quantization: The Power of the Exponent
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
95
87
0
19 Aug 2022
Mixed-Precision Neural Networks: A Survey
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
102
13
0
11 Aug 2022
Safety and Performance, Why not Both? Bi-Objective Optimized Model
  Compression toward AI Software Deployment
Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment
Jie Zhu
Leye Wang
Xiao Han
87
10
0
11 Aug 2022
A Time-to-first-spike Coding and Conversion Aware Training for
  Energy-Efficient Deep Spiking Neural Network Processor Design
A Time-to-first-spike Coding and Conversion Aware Training for Energy-Efficient Deep Spiking Neural Network Processor Design
Dongwoo Lew
Kyungchul Lee
Jongsun Park
32
14
0
09 Aug 2022
Adaptive Edge Offloading for Image Classification Under Rate Limit
Adaptive Edge Offloading for Image Classification Under Rate Limit
Jiaming Qiu
Ruiqi Wang
Ayan Chakrabarti
Roch Guérin
Chenyang Lu
OffRL
61
14
0
31 Jul 2022
Symmetry Regularization and Saturating Nonlinearity for Robust
  Quantization
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park
Yeongsang Jang
Eunhyeok Park
MQ
67
2
0
31 Jul 2022
Efficient NLP Model Finetuning via Multistage Data Filtering
Efficient NLP Model Finetuning via Multistage Data Filtering
Ouyang Xu
S. Ansari
F. Lin
Yangfeng Ji
74
4
0
28 Jul 2022
Reconciling Security and Communication Efficiency in Federated Learning
Reconciling Security and Communication Efficiency in Federated Learning
Karthik Prasad
Sayan Ghosh
Graham Cormode
Ilya Mironov
Ashkan Yousefpour
Pierre Stock
FedML
71
9
0
26 Jul 2022
Versatile Weight Attack via Flipping Limited Bits
Versatile Weight Attack via Flipping Limited Bits
Jiawang Bai
Baoyuan Wu
Zhifeng Li
Shutao Xia
AAML
71
20
0
25 Jul 2022
Inference skipping for more efficient real-time speech enhancement with
  parallel RNNs
Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Xiaohuai Le
Tong Lei
Kai-Jyun Chen
Jing Lu
132
20
0
22 Jul 2022
Quantized Sparse Weight Decomposition for Neural Network Compression
Quantized Sparse Weight Decomposition for Neural Network Compression
Andrey Kuzmin
M. V. Baalen
Markus Nagel
Arash Behboodi
MQ
60
3
0
22 Jul 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Kan Wu
Jinnian Zhang
Houwen Peng
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
79
267
0
21 Jul 2022
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A
  Meta-Learning Approach
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Jiseok Youn
Jaehun Song
Hyung-Sin Kim
S. Bahk
MQ
61
8
0
20 Jul 2022
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low
  Bit Quantization and Runtime
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime
Saad Ashfaq
Mohammadhossein Askarihemmat
Sudhakar Sah
Ehsan Saboori
Olivier Mastropietro
Alexander Hoffman
BDLMQ
38
5
0
18 Jul 2022
Low-bit Shift Network for End-to-End Spoken Language Understanding
Low-bit Shift Network for End-to-End Spoken Language Understanding
Anderson R. Avila
Khalil Bibi
Ruizhi Yang
Xinlin Li
Chao Xing
Xiao Chen
MQ
102
4
0
15 Jul 2022
Low-Precision Arithmetic for Fast Gaussian Processes
Low-Precision Arithmetic for Fast Gaussian Processes
Wesley J. Maddox
Andres Potapczynski
A. Wilson
54
12
0
14 Jul 2022
CEG4N: Counter-Example Guided Neural Network Quantization Refinement
CEG4N: Counter-Example Guided Neural Network Quantization Refinement
J. Matos
I. Bessa
Edoardo Manino
Xidan Song
Lucas C. Cordeiro
MQ
63
2
0
09 Jul 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer
  Inference
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
139
106
0
04 Jul 2022
On-Device Training Under 256KB Memory
On-Device Training Under 256KB Memory
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Chuang Gan
Song Han
MQ
144
213
0
30 Jun 2022
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model
  Co-Exploration
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model Co-Exploration
A. Inci
Siri Garudanagiri Virupaksha
Aman Jain
Ting-Wu Chin
Venkata Vivek Thallam
Ruizhou Ding
Diana Marculescu
MQ
44
3
0
30 Jun 2022
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for
  Natural Language Understanding
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
61
3
0
30 Jun 2022
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models
  in Model
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model
Xudong Pan
Yifan Yan
Sheng Zhang
Mi Zhang
Min Yang
67
1
0
29 Jun 2022
QuantFace: Towards Lightweight Face Recognition by Synthetic Data
  Low-bit Quantization
QuantFace: Towards Lightweight Face Recognition by Synthetic Data Low-bit Quantization
Fadi Boutros
Naser Damer
Arjan Kuijper
CVBMMQ
77
38
0
21 Jun 2022
Low-Precision Stochastic Gradient Langevin Dynamics
Low-Precision Stochastic Gradient Langevin Dynamics
Ruqi Zhang
A. Wilson
Chris De Sa
BDL
65
14
0
20 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient
  Inference in Large-Scale Generative Language Models
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
99
85
0
20 Jun 2022
Fast Lossless Neural Compression with Integer-Only Discrete Flows
Fast Lossless Neural Compression with Integer-Only Discrete Flows
Siyu Wang
Jianfei Chen
Chongxuan Li
Jun Zhu
Bo Zhang
MQ
64
7
0
17 Jun 2022
Channel-wise Mixed-precision Assignment for DNN Inference on Constrained
  Edge Nodes
Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes
Matteo Risso
Luca Bompani
Luca Benini
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
68
12
0
17 Jun 2022
Edge Inference with Fully Differentiable Quantized Mixed Precision
  Neural Networks
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks
Clemens J. S. Schaefer
Siddharth Joshi
Shane Li
Raul Blazquez
MQ
78
10
0
15 Jun 2022
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
Alessandro Pappalardo
Yaman Umuroglu
Michaela Blott
Jovan Mitrevski
B. Hawks
...
J. Muhizi
Matthew Trahms
Shih-Chieh Hsu
Scott Hauck
Javier Mauricio Duarte
MQ
39
18
0
15 Jun 2022
Two-stage Human Activity Recognition on Microcontrollers with Decision
  Trees and CNNs
Two-stage Human Activity Recognition on Microcontrollers with Decision Trees and CNNs
Francesco Daghero
Daniele Jahier Pagliari
Massimo Poncino
BDL
69
12
0
07 Jun 2022
Decentralized Low-Latency Collaborative Inference via Ensembles on the
  Edge
Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge
M. Malka
Erez Farhan
Hai Morgenstern
Nir Shlezinger
FedML
74
13
0
07 Jun 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning
  Algorithm
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
104
11
0
07 Jun 2022
DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks
Y. Fu
Haichuan Yang
Jiayi Yuan
Meng Li
Cheng Wan
Raghuraman Krishnamoorthi
Vikas Chandra
Yingyan Lin
130
19
0
02 Jun 2022
NIPQ: Noise proxy-based Integrated Pseudo-Quantization
NIPQ: Noise proxy-based Integrated Pseudo-Quantization
Juncheol Shin
Junhyuk So
Sein Park
Seungyeop Kang
S. Yoo
Eunhyeok Park
53
28
0
02 Jun 2022
FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge
  Systems
FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems
Ali Mokhtari
Md. Abir Hossen
Pooyan Jamshidi
M. Salehi
57
9
0
31 May 2022
Machine Learning for Microcontroller-Class Hardware: A Review
Machine Learning for Microcontroller-Class Hardware: A Review
Swapnil Sayan Saha
S. Sandha
Mani B. Srivastava
109
125
0
29 May 2022
Adaptive Random Forests for Energy-Efficient Inference on
  Microcontrollers
Adaptive Random Forests for Energy-Efficient Inference on Microcontrollers
Francesco Daghero
Luca Bompani
Chen Xie
Luca Benini
A. Calimera
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
59
8
0
27 May 2022
Ultra-compact Binary Neural Networks for Human Activity Recognition on
  RISC-V Processors
Ultra-compact Binary Neural Networks for Human Activity Recognition on RISC-V Processors
Francesco Daghero
Chenhao Xie
Daniele Jahier Pagliari
Luca Bompani
Marco Castellano
Luca Gandolfi
A. Calimera
Enrico Macii
Massimo Poncino
BDLMQ
93
13
0
25 May 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More
  Compressible Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
116
20
0
25 May 2022
Energy-efficient Deployment of Deep Learning Applications on Cortex-M
  based Microcontrollers using Deep Compression
Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression
M. Deutel
Philipp Woller
Christopher Mutschler
Jürgen Teich
113
4
0
20 May 2022
Fast matrix multiplication for binary and ternary CNNs on ARM CPU
Fast matrix multiplication for binary and ternary CNNs on ARM CPU
A. Trusov
E. Limonova
D. Nikolaev
V. Arlazarov
MQ
57
5
0
18 May 2022
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
  Neural Network, a Survey
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Paul Wimmer
Jens Mehnert
Alexandru Paul Condurache
DD
98
21
0
17 May 2022
On Algebraic Constructions of Neural Networks with Small Weights
On Algebraic Constructions of Neural Networks with Small Weights
Kordag Mehmet Kilic
Jin Sima
J. Bruck
29
2
0
17 May 2022
ImageSig: A signature transform for ultra-lightweight image recognition
ImageSig: A signature transform for ultra-lightweight image recognition
Mohamed Ramzy Ibrahim
Terry Lyons
VLM
141
7
0
13 May 2022
A Framework for Event-based Computer Vision on a Mobile Device
A Framework for Event-based Computer Vision on a Mobile Device
Gregor Lenz
S. Picaud
S. Ieng
FedML
55
1
0
13 May 2022
Fast Conditional Network Compression Using Bayesian HyperNetworks
Fast Conditional Network Compression Using Bayesian HyperNetworks
Phuoc Nguyen
T. Tran
Ky Le
Sunil R. Gupta
Santu Rana
Dang Nguyen
Trong Nguyen
S. Ryan
Svetha Venkatesh
BDL
54
7
0
13 May 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
53
7
0
12 May 2022
Serving and Optimizing Machine Learning Workflows on Heterogeneous
  Infrastructures
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu
Matthew Lentz
Danyang Zhuo
Yao Lu
84
24
0
10 May 2022
A Survey on AI Sustainability: Emerging Trends on Learning Algorithms
  and Research Challenges
A Survey on AI Sustainability: Emerging Trends on Learning Algorithms and Research Challenges
Zhenghua Chen
Min-man Wu
Alvin Chan
Xiaoli Li
Yew-Soon Ong
54
7
0
08 May 2022
Previous
123...131415...242526
Next