ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,260 papers shown
Title
Unified Data-Free Compression: Pruning and Quantization without
  Fine-Tuning
Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning
Shipeng Bai
Jun Chen
Xintian Shen
Yixuan Qian
Yong Liu
MQ
24
12
0
14 Aug 2023
Efficient Neural PDE-Solvers using Quantization Aware Training
Efficient Neural PDE-Solvers using Quantization Aware Training
W.V.S.O. van den Dool
Tijmen Blankevoort
Max Welling
Yuki M. Asano
MQ
38
3
0
14 Aug 2023
Exploring Frequency-Inspired Optimization in Transformer for Efficient
  Single Image Super-Resolution
Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution
Ao Li
Le Zhang
Yun-Hai Liu
Ce Zhu
33
11
0
09 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models
  Fine-tuning
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
S. Shi
Xiaowen Chu
Bo-wen Li
AI4CE
18
92
0
07 Aug 2023
Tango: rethinking quantization for graph neural network training on GPUs
Tango: rethinking quantization for graph neural network training on GPUs
Shiyang Chen
Da Zheng
Caiwen Ding
Chengying Huan
Yuede Ji
Hang Liu
GNN
MQ
31
5
0
02 Aug 2023
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
Manasa Manohara
Sankalp Dayal
Tarqi Afzal
Rahul Bakshi
Kahkuen Fu
MQ
24
0
0
01 Aug 2023
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights
  Generation
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Stylianos I. Venieris
Javier Fernandez-Marques
Nicholas D. Lane
MQ
35
3
0
25 Jul 2023
Adaptive ResNet Architecture for Distributed Inference in
  Resource-Constrained IoT Systems
Adaptive ResNet Architecture for Distributed Inference in Resource-Constrained IoT Systems
Fazeela Mazhar Khan
Emna Baccour
A. Erbad
Mounir Hamdi
31
2
0
21 Jul 2023
EMQ: Evolving Training-free Proxies for Automated Mixed Precision
  Quantization
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Peijie Dong
Lujun Li
Zimian Wei
Xin-Yi Niu
Zhiliang Tian
H. Pan
MQ
51
28
0
20 Jul 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Vasileios Leon
Muhammad Abdullah Hanif
Giorgos Armeniakos
Xun Jiao
Mohamed Bennai
K. Pekmestzi
Dimitrios Soudris
42
3
0
20 Jul 2023
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the
  Data-Scarce Edge
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
Young D. Kwon
Rui Li
Stylianos I. Venieris
Jagmohan Chauhan
Nicholas D. Lane
Cecilia Mascolo
24
8
0
19 Jul 2023
Towards Trustworthy Dataset Distillation
Towards Trustworthy Dataset Distillation
Shijie Ma
Fei Zhu
Zhen Cheng
Xu-Yao Zhang
DD
42
14
0
18 Jul 2023
PLiNIO: A User-Friendly Library of Gradient-based Methods for
  Complexity-aware DNN Optimization
PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization
Daniele Jahier Pagliari
Matteo Risso
Beatrice Alessandra Motetti
Luca Bompani
21
8
0
18 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
48
63
0
16 Jul 2023
Learning Kernel-Modulated Neural Representation for Efficient Light
  Field Compression
Learning Kernel-Modulated Neural Representation for Efficient Light Field Compression
Jinglei Shi
Yihong Xu
C. Guillemot
32
6
0
12 Jul 2023
Self-Distilled Quantization: Achieving High Compression Rates in
  Transformer-Based Language Models
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
42
1
0
12 Jul 2023
Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving
  Perception
Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception
Chi-Chih Chang
Wei-Cheng Lin
Peide Wang
Shengtao Yu
Yunrong Lu
Kuan-Cheng Lin
Kaiyang Wu
VLM
39
13
0
10 Jul 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
27
5
0
10 Jul 2023
Pruning vs Quantization: Which is Better?
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
27
48
0
06 Jul 2023
Minimizing Energy Consumption of Deep Learning Models by Energy-Aware
  Training
Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training
Dario Lazzaro
Antonio Emanuele Cinà
Maura Pintor
Ambra Demontis
Battista Biggio
Fabio Roli
Marcello Pelillo
38
7
0
01 Jul 2023
SysNoise: Exploring and Benchmarking Training-Deployment System
  Inconsistency
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
Yan Wang
Yuhang Li
Ruihao Gong
Aishan Liu
Yanfei Wang
...
Yongqiang Yao
Yunchen Zhang
Tianzi Xiao
F. Yu
Xianglong Liu
AAML
34
0
0
01 Jul 2023
Q-YOLO: Efficient Inference for Real-time Object Detection
Q-YOLO: Efficient Inference for Real-time Object Detection
Mingze Wang
H. Sun
Jun Shi
Xuhui Liu
Baochang Zhang
Xianbin Cao
ObjD
42
8
0
01 Jul 2023
Filter Pruning for Efficient CNNs via Knowledge-driven Differential
  Filter Sampler
Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler
Shaohui Lin
Wenxuan Huang
Jiao Xie
Baochang Zhang
Yunhang Shen
Zhou Yu
Jungong Han
David Doermann
25
2
0
01 Jul 2023
Designing strong baselines for ternary neural network quantization
  through support and mass equalization
Designing strong baselines for ternary neural network quantization through support and mass equalization
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
30
0
0
30 Jun 2023
An Efficient Sparse Inference Software Accelerator for Transformer-based
  Language Models on CPUs
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
...
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
38
4
0
28 Jun 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
66
261
0
24 Jun 2023
Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for
  Extreme Model Compression
Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression
Tianhong Huang
Victor Agostinelli
Lizhong Chen
MQ
20
0
0
24 Jun 2023
QNNRepair: Quantized Neural Network Repair
QNNRepair: Quantized Neural Network Repair
Xidan Song
Youcheng Sun
Mustafa A. Mustafa
Lucas C. Cordeiro
MQ
36
1
0
23 Jun 2023
Efficient Online Processing with Deep Neural Networks
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
26
0
0
23 Jun 2023
Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion
Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion
C. K. Loo
W. S. Liew
S. Wermter
CLL
19
0
0
23 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
23
87
0
22 Jun 2023
Training Transformers with 4-bit Integers
Training Transformers with 4-bit Integers
Haocheng Xi
Changhao Li
Jianfei Chen
Jun Zhu
MQ
25
48
0
21 Jun 2023
DGEMM on Integer Matrix Multiplication Unit
DGEMM on Integer Matrix Multiplication Unit
Hiroyuki Ootomo
K. Ozaki
Rio Yokota
17
12
0
21 Jun 2023
HiNeRV: Video Compression with Hierarchical Encoding-based Neural
  Representation
HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
Ho Man Kwan
Ge Gao
Fan Zhang
Andrew Gower
David Bull
24
49
0
16 Jun 2023
Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT
  Edge Nodes
Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes
Francesco Daghero
Luca Bompani
Enrico Macii
P. Montuschi
M. Poncino
Daniele Jahier Pagliari
31
5
0
16 Jun 2023
RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on
  Edge
RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge
Adithya Krishna
Srikanth Rohit Nudurupati
Chandana D G
Pritesh Dwivedi
André van Schaik
M. Mehendale
Chetan Singh Thakur
30
12
0
10 Jun 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient
  Vision Transformer
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
37
16
0
10 Jun 2023
Precision-aware Latency and Energy Balancing on Multi-Accelerator
  Platforms for DNN Inference
Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference
Matteo Risso
Luca Bompani
G. M. Sarda
Luca Benini
Enrico Macii
M. Poncino
Marian Verhelst
Daniele Jahier Pagliari
30
4
0
08 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
Temporal Dynamic Quantization for Diffusion Models
Temporal Dynamic Quantization for Diffusion Models
Junhyuk So
Jungwon Lee
Daehyun Ahn
Hyungjun Kim
Eunhyeok Park
DiffM
MQ
23
60
0
04 Jun 2023
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
  in Low Resource Settings
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings
Daniel Rotem
Michael Hassid
Jonathan Mamou
Roy Schwartz
25
5
0
04 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
47
480
0
01 Jun 2023
Harmonic enhancement using learnable comb filter for light-weight
  full-band speech enhancement model
Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Xiaohuai Le
Tong Lei
Li Chen
Yiqing Guo
Chao-Peng He
...
Hua-Jing Gao
Yijian Xiao
Piao Ding
Shenyi Song
Jing Lu
34
4
0
01 Jun 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at Scale
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
54
38
0
30 May 2023
Compact Real-time Radiance Fields with Neural Codebook
Compact Real-time Radiance Fields with Neural Codebook
Lingzhi Li
Zhongshu Wang
Zhen Shen
Li Shen
Ping Tan
29
4
0
29 May 2023
Reducing Communication for Split Learning by Randomized Top-k
  Sparsification
Reducing Communication for Split Learning by Randomized Top-k Sparsification
Fei Zheng
Chaochao Chen
Lingjuan Lyu
Binhui Yao
FedML
31
10
0
29 May 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
60
191
0
29 May 2023
Random-Access Neural Compression of Material Textures
Random-Access Neural Compression of Material Textures
Karthik Vaidyanathan
Marco Salvi
Bartlomiej Wronski
T. Akenine-Moller
P. Ebelin
Aaron E. Lefohn
27
22
0
26 May 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware
  Acceleration
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
29
4
0
25 May 2023
Free Lunch for Efficient Textual Commonsense Integration in Language
  Models
Free Lunch for Efficient Textual Commonsense Integration in Language Models
Wanyun Cui
Xingran Chen
40
3
0
24 May 2023
Previous
123...8910...242526
Next