Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05877
Cited By
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
50 / 1,298 papers shown
Title
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
116
2
0
06 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
Soheil Feizi
A. Bhatele
110
22
0
04 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
193
35
0
04 Jun 2024
TinySV: Speaker Verification in TinyML with On-device Learning
Massimo Pavan
Gioele Mombelli
Francesco Sinacori
Manuel Roveri
63
3
0
03 Jun 2024
P
2
^2
2
-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
83
6
0
30 May 2024
Exploiting LLM Quantization
Kazuki Egashira
Mark Vero
Robin Staab
Jingxuan He
Martin Vechev
MQ
78
19
0
28 May 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Tianchen Zhao
Xuefei Ning
Tongcheng Fang
En-hao Liu
Guyue Huang
Zinan Lin
Shengen Yan
Guohao Dai
Yu Wang
MQ
DiffM
131
24
0
28 May 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
81
8
0
28 May 2024
Extreme Compression of Adaptive Neural Images
Leo Hoshikawa
Marcos V. Conde
Takeshi Ohashi
Atsushi Irie
97
1
0
27 May 2024
DAGER: Exact Gradient Inversion for Large Language Models
Ivo Petrov
Dimitar I. Dimitrov
Maximilian Baader
Mark Niklas Muller
Martin Vechev
FedML
92
5
0
24 May 2024
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Kunjal Panchal
Nisarg Parikh
Sunav Choudhary
Lijun Zhang
Yuriy Brun
Hui Guan
120
3
0
24 May 2024
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
Minghui Zou
Ronghui Guo
Sai Zhang
Xiaowang Zhang
Zhiyong Feng
MQ
82
1
0
24 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
87
15
0
23 May 2024
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
113
6
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
171
0
0
23 May 2024
Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Mykhail M. Uss
Ruslan Yermolenko
Olena Kolodiazhna
Oleksii Shashko
Ivan Safonov
Volodymyr Savin
Yoonjae Yeo
Seowon Ji
Jaeyun Jeong
MQ
62
0
0
22 May 2024
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input
Senmao Tian
Haoyu Gao
Gangyi Hong
Shuyun Wang
JingJie Wang
Xin Yu
Shunli Zhang
MQ
56
1
0
22 May 2024
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li
Yishuo Cai
Haowei Li
Feng Xue
Zhifeng Li
Yiming Li
MQ
AAML
89
21
0
21 May 2024
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
Peiyu Liu
Zeming Gao
Wayne Xin Zhao
Yipeng Ma
Tao Wang
Ji-Rong Wen
MQ
145
5
0
21 May 2024
Can formal argumentative reasoning enhance LLMs performances?
Federico Castagna
I. Sassoon
Simon Parsons
LRM
LLMAG
43
2
0
16 May 2024
Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection
Yunqian Fan
Xiuying Wei
Ruihao Gong
Yuqing Ma
Xiangguo Zhang
Qi Zhang
Xianglong Liu
MQ
62
3
0
10 May 2024
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
Ruihao Gong
Yang Yong
Zining Wang
Jinyang Guo
Xiuying Wei
Yuqing Ma
Xianglong Liu
95
6
0
09 May 2024
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
Huihong Shi
Haikuo Shao
Wendong Mao
Zhongfeng Wang
ViT
MQ
74
3
0
06 May 2024
Neural Graphics Texture Compression Supporting Random Acces
Farzad Farhadzadeh
Qiqi Hou
Hoang Le
Amir Said
Randall Rauwendaal
Alex Bourd
Fatih Porikli
57
1
0
06 May 2024
Collage: Light-Weight Low-Precision Strategy for LLM Training
Tao Yu
Gaurav Gupta
Karthick Gopalswamy
Amith R. Mamidala
Hao Zhou
Jeffrey Huynh
Youngsuk Park
Ron Diamant
Anoop Deoras
Jun Huan
MQ
99
3
0
06 May 2024
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv
Hong Chen
Jinyang Guo
Yifu Ding
Xianglong Liu
VLM
MQ
85
16
0
06 May 2024
Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
Guillem Ramírez
Alexandra Birch
Ivan Titov
101
11
0
03 May 2024
Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios
Nils L. Westhausen
Hendrik Kayser
Theresa Jansen
Bernd T. Meyer
73
4
0
03 May 2024
TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded Systems
Byungchul Chae
Jiae Kim
Seonyeong Heo
VLM
65
2
0
03 May 2024
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Jian Meng
Yuan Liao
Anupreetham Anupreetham
Ahmed Hassan
Shixing Yu
Han-Sok Suh
Xiaofeng Hu
Jae-sun Seo
MQ
91
2
0
02 May 2024
CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications
J. Blumenkamp
Steven D. Morad
Jennifer Gielis
Amanda Prorok
89
5
0
02 May 2024
When Quantization Affects Confidence of Large Language Models?
Irina Proskurina
Luc Brun
Guillaume Metzler
Julien Velcin
MQ
122
2
0
01 May 2024
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Dayou Du
Gu Gong
Xiaowen Chu
MQ
140
8
0
01 May 2024
Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications
Colby R. Banbury
Emil Njor
Andrea Mattia Garavagno
Mark Mazumder
Matthew P. Stewart
Pete Warden
M. Kudlur
Nat Jeffries
Xenofon Fafoutis
Vijay Janapa Reddi
VLM
137
0
0
01 May 2024
Training-free Graph Neural Networks and the Power of Labels as Features
Ryoma Sato
102
4
0
30 Apr 2024
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
Yufeng Yang
Adrian Kneip
Charlotte Frenkel
GNN
126
6
0
30 Apr 2024
Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches
Weiming Xu
Tao Yang
Peng Zhang
44
3
0
27 Apr 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
172
107
0
22 Apr 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
152
42
0
22 Apr 2024
EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder
Hasanul Mahmud
Kevin Desai
P. Lama
Sushil Prasad
85
0
0
21 Apr 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
67
3
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
99
1
0
17 Apr 2024
Efficient and accurate neural field reconstruction using resistive memory
Yifei Yu
Shaocong Wang
Woyu Zhang
Xinyuan Zhang
Xiuzhe Wu
...
Zhongrui Wang
Dashan Shang
Qi Liu
Kwang-Ting Cheng
Ming-Yuan Liu
73
0
0
15 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
134
67
0
08 Apr 2024
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Jan Klhufek
Miroslav Safar
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
MQ
81
1
0
08 Apr 2024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Yikang Shen
Yoon Kim
MQ
107
10
0
04 Apr 2024
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Chee Hong
Kyoung Mu Lee
SupR
MQ
48
2
0
04 Apr 2024
On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Sean Farhat
Deming Chen
118
0
0
04 Apr 2024
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization
B. Ghavami
Amin Kamjoo
Lesley Shannon
S. Wilton
MQ
45
0
0
03 Apr 2024
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
...
Andrew G. Howard
Lukasz Lew
Sherief Reda
Ville Rautio
Daniele Moro
MQ
129
1
0
29 Mar 2024
Previous
1
2
3
...
5
6
7
...
24
25
26
Next