ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXiv (abs)PDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,298 papers shown
Title
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
116
2
0
06 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
Soheil Feizi
A. Bhatele
110
22
0
04 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQVGen
193
35
0
04 Jun 2024
TinySV: Speaker Verification in TinyML with On-device Learning
TinySV: Speaker Verification in TinyML with On-device Learning
Massimo Pavan
Gioele Mombelli
Francesco Sinacori
Manuel Roveri
63
3
0
03 Jun 2024
P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for
  Fully Quantized Vision Transformer
P2^22-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
83
6
0
30 May 2024
Exploiting LLM Quantization
Exploiting LLM Quantization
Kazuki Egashira
Mark Vero
Robin Staab
Jingxuan He
Martin Vechev
MQ
78
19
0
28 May 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with
  Metric-Decoupled Mixed Precision Quantization
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Tianchen Zhao
Xuefei Ning
Tongcheng Fang
En-hao Liu
Guyue Huang
Zinan Lin
Shengen Yan
Guohao Dai
Yu Wang
MQDiffM
131
24
0
28 May 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit
  Large Language Models
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
81
8
0
28 May 2024
Extreme Compression of Adaptive Neural Images
Extreme Compression of Adaptive Neural Images
Leo Hoshikawa
Marcos V. Conde
Takeshi Ohashi
Atsushi Irie
97
1
0
27 May 2024
DAGER: Exact Gradient Inversion for Large Language Models
DAGER: Exact Gradient Inversion for Large Language Models
Ivo Petrov
Dimitar I. Dimitrov
Maximilian Baader
Mark Niklas Muller
Martin Vechev
FedML
92
5
0
24 May 2024
Thinking Forward: Memory-Efficient Federated Finetuning of Language
  Models
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Kunjal Panchal
Nisarg Parikh
Sunav Choudhary
Lijun Zhang
Yuriy Brun
Hui Guan
120
3
0
24 May 2024
BiSup: Bidirectional Quantization Error Suppression for Large Language
  Models
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
Minghui Zou
Ronghui Guo
Sai Zhang
Xiaowang Zhang
Zhiyong Feng
MQ
82
1
0
24 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
87
15
0
23 May 2024
Super Tiny Language Models
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
113
6
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
171
0
0
23 May 2024
Two Heads are Better Than One: Neural Networks Quantization with 2D
  Hilbert Curve-based Output Representation
Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Mykhail M. Uss
Ruslan Yermolenko
Olena Kolodiazhna
Oleksii Shashko
Ivan Safonov
Volodymyr Savin
Yoonjae Yeo
Seowon Ji
Jaeyun Jeong
MQ
62
0
0
22 May 2024
QGait: Toward Accurate Quantization for Gait Recognition with Binarized
  Input
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input
Senmao Tian
Haoyu Gao
Gangyi Hong
Shuyun Wang
JingJie Wang
Xin Yu
Shunli Zhang
MQ
56
1
0
22 May 2024
Nearest is Not Dearest: Towards Practical Defense against
  Quantization-conditioned Backdoor Attacks
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li
Yishuo Cai
Haowei Li
Feng Xue
Zhifeng Li
Yiming Li
MQAAML
89
21
0
21 May 2024
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for
  KV Cache Compression
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
Peiyu Liu
Zeming Gao
Wayne Xin Zhao
Yipeng Ma
Tao Wang
Ji-Rong Wen
MQ
145
5
0
21 May 2024
Can formal argumentative reasoning enhance LLMs performances?
Can formal argumentative reasoning enhance LLMs performances?
Federico Castagna
I. Sassoon
Simon Parsons
LRMLLMAG
43
2
0
16 May 2024
Selective Focus: Investigating Semantics Sensitivity in Post-training
  Quantization for Lane Detection
Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection
Yunqian Fan
Xiuying Wei
Ruihao Gong
Yuqing Ma
Xiangguo Zhang
Qi Zhang
Xianglong Liu
MQ
62
3
0
10 May 2024
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity
  Allocation with Global Constraint in Minutes
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
Ruihao Gong
Yang Yong
Zining Wang
Jinyang Guo
Xiuying Wei
Yuqing Ma
Xianglong Liu
95
6
0
09 May 2024
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free
  Efficient Vision Transformer
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
Huihong Shi
Haikuo Shao
Wendong Mao
Zhongfeng Wang
ViTMQ
74
3
0
06 May 2024
Neural Graphics Texture Compression Supporting Random Acces
Neural Graphics Texture Compression Supporting Random Acces
Farzad Farhadzadeh
Qiqi Hou
Hoang Le
Amir Said
Randall Rauwendaal
Alex Bourd
Fatih Porikli
57
1
0
06 May 2024
Collage: Light-Weight Low-Precision Strategy for LLM Training
Collage: Light-Weight Low-Precision Strategy for LLM Training
Tao Yu
Gaurav Gupta
Karthick Gopalswamy
Amith R. Mamidala
Hao Zhou
Jeffrey Huynh
Youngsuk Park
Ron Diamant
Anoop Deoras
Jun Huan
MQ
99
3
0
06 May 2024
PTQ4SAM: Post-Training Quantization for Segment Anything
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv
Hong Chen
Jinyang Guo
Yifu Ding
Xianglong Liu
VLMMQ
85
16
0
06 May 2024
Optimising Calls to Large Language Models with Uncertainty-Based
  Two-Tier Selection
Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
Guillem Ramírez
Alexandra Birch
Ivan Titov
101
11
0
03 May 2024
Real-time multichannel deep speech enhancement in hearing aids:
  Comparing monaural and binaural processing in complex acoustic scenarios
Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios
Nils L. Westhausen
Hendrik Kayser
Theresa Jansen
Bernd T. Meyer
73
4
0
03 May 2024
TinySeg: Model Optimizing Framework for Image Segmentation on Tiny
  Embedded Systems
TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded Systems
Byungchul Chae
Jiae Kim
Seonyeong Heo
VLM
65
2
0
03 May 2024
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression
  and Deployment Toolkit for Prototype Hardware Accelerator Design
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Jian Meng
Yuan Liao
Anupreetham Anupreetham
Ahmed Hassan
Shixing Yu
Han-Sok Suh
Xiaofeng Hu
Jae-sun Seo
MQ
91
2
0
02 May 2024
CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot
  Applications
CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications
J. Blumenkamp
Steven D. Morad
Jennifer Gielis
Amanda Prorok
89
5
0
02 May 2024
When Quantization Affects Confidence of Large Language Models?
When Quantization Affects Confidence of Large Language Models?
Irina Proskurina
Luc Brun
Guillaume Metzler
Julien Velcin
MQ
122
2
0
01 May 2024
Model Quantization and Hardware Acceleration for Vision Transformers: A
  Comprehensive Survey
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Dayou Du
Gu Gong
Xiaowen Chu
MQ
140
8
0
01 May 2024
Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications
Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications
Colby R. Banbury
Emil Njor
Andrea Mattia Garavagno
Mark Mazumder
Matthew P. Stewart
Pete Warden
M. Kudlur
Nat Jeffries
Xenofon Fafoutis
Vijay Janapa Reddi
VLM
137
0
0
01 May 2024
Training-free Graph Neural Networks and the Power of Labels as Features
Training-free Graph Neural Networks and the Power of Labels as Features
Ryoma Sato
102
4
0
30 Apr 2024
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
Yufeng Yang
Adrian Kneip
Charlotte Frenkel
GNN
126
6
0
30 Apr 2024
Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised
  and Unsupervised Learning Approaches
Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches
Weiming Xu
Tao Yang
Peng Zhang
44
3
0
27 Apr 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
172
107
0
22 Apr 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
152
42
0
22 Apr 2024
EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven
  Generalized Converting Autoencoder
EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder
Hasanul Mahmud
Kevin Desai
P. Lama
Sushil Prasad
85
0
0
21 Apr 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model
  Acceleration
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
67
3
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
99
1
0
17 Apr 2024
Efficient and accurate neural field reconstruction using resistive
  memory
Efficient and accurate neural field reconstruction using resistive memory
Yifei Yu
Shaocong Wang
Woyu Zhang
Xinyuan Zhang
Xiuzhe Wu
...
Zhongrui Wang
Dashan Shang
Qi Liu
Kwang-Ting Cheng
Ming-Yuan Liu
73
0
0
15 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
134
67
0
08 Apr 2024
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Jan Klhufek
Miroslav Safar
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
MQ
81
1
0
08 Apr 2024
Mitigating the Impact of Outlier Channels for Language Model
  Quantization with Activation Regularization
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Yikang Shen
Yoon Kim
MQ
107
10
0
04 Apr 2024
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Chee Hong
Kyoung Mu Lee
SupRMQ
48
2
0
04 Apr 2024
On the Surprising Efficacy of Distillation as an Alternative to
  Pre-Training Small Models
On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Sean Farhat
Deming Chen
118
0
0
04 Apr 2024
DNN Memory Footprint Reduction via Post-Training Intra-Layer
  Multi-Precision Quantization
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization
B. Ghavami
Amin Kamjoo
Lesley Shannon
S. Wilton
MQ
45
0
0
03 Apr 2024
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural
  Networks
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
...
Andrew G. Howard
Lukasz Lew
Sherief Reda
Ville Rautio
Daniele Moro
MQ
129
1
0
29 Mar 2024
Previous
123...567...242526
Next