Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.08295
Cited By
A White Paper on Neural Network Quantization
15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A White Paper on Neural Network Quantization"
50 / 264 papers shown
Title
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
113
2
0
12 Feb 2024
FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via Large Language Models
Ruiyang Qin
Yuting Hu
Zheyu Yan
Jinjun Xiong
Ahmed Abbasi
Yiyu Shi
66
7
0
09 Feb 2024
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Cheng Zhang
Jianyi Cheng
George A. Constantinides
Yiren Zhao
MQ
97
15
0
04 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
246
18
0
30 Jan 2024
HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
Tianshi Xu
Meng Li
Runsheng Wang
81
1
0
29 Jan 2024
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection
Sifan Zhou
Liang Li
Xinyu Zhang
Bo Zhang
Shipeng Bai
Miao Sun
Ziyu Zhao
Xiaobo Lu
Xiangxiang Chu
MQ
79
14
0
29 Jan 2024
CoSS: Co-optimizing Sensor and Sampling Rate for Data-Efficient AI in Human Activity Recognition
Mengxi Liu
Zimin Zhao
Daniel Geissler
Bo Zhou
Sungho Suh
P. Lukowicz
58
0
0
03 Jan 2024
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching
Pengmiao Zhang
Neelesh Gupta
Rajgopal Kannan
Viktor K. Prasanna
69
3
0
23 Dec 2023
SCoTTi: Save Computation at Training Time with an adaptive framework
Ziyu Li
Enzo Tartaglione
Van-Tam Nguyen
90
0
0
19 Dec 2023
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting
Dawei Yang
Ning He
Xing Hu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
MQ
90
7
0
17 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
129
17
0
13 Dec 2023
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang
Siqi Li
Junhao Chen
Zhuangzhi Chen
Tianxin Huang
Linpeng Peng
Yong-Jin Liu
53
0
0
12 Dec 2023
Stateful Large Language Model Serving with Pensieve
Lingfan Yu
Jinyang Li
RALM
KELM
LLMAG
77
15
0
09 Dec 2023
MoEC: Mixture of Experts Implicit Neural Compression
Jianchen Zhao
Cheng-Ching Tseng
Ming Lu
Ruichuan An
Xiaobao Wei
He Sun
Shanghang Zhang
81
3
0
03 Dec 2023
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models
Srinath Namburi
Makesh Narsimhan Sreedhar
Srinath Srinivasan
Frederic Sala
MQ
63
11
0
01 Dec 2023
The Trifecta: Three simple techniques for training deeper Forward-Forward networks
Thomas Dooms
Ing Jyh Tsang
José Oramas
69
4
0
29 Nov 2023
Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization
Jinhao Li
Jiaming Xu
Shiyao Li
Shan Huang
Jun Liu
Yaoxiu Lian
Guohao Dai
MQ
59
3
0
28 Nov 2023
Hybrid Synaptic Structure for Spiking Neural Network Realization
S. Razmkhah
M. A. Karamuftuoglu
A. Bozbey
46
5
0
13 Nov 2023
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing
Siao Tang
Xin Wang
Hong Chen
Chaoyu Guan
Zewen Wu
Yansong Tang
Wenwu Zhu
MQ
97
16
0
10 Nov 2023
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks
Kartik Gupta
Akshay Asthana
MQ
36
8
0
09 Nov 2023
Fully Quantized Always-on Face Detector Considering Mobile Image Sensors
Haechang Lee
Wongi Jeong
Dongil Ryu
Hyunwoo Je
Albert No
Kijeong Kim
Se Young Chun
CVBM
59
0
0
02 Nov 2023
Exploring Post-Training Quantization of Protein Language Models
Shuang Peng
Fei Yang
Ning Sun
Sheng Chen
Yanfeng Jiang
Aimin Pan
MQ
52
0
0
30 Oct 2023
QWID: Quantized Weed Identification Deep neural network
Parikshit Singh Rathore
MQ
46
0
0
29 Oct 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
60
4
0
27 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
84
29
0
25 Oct 2023
Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
Maximilian Krahn
Michele Sasdelli
Fengyi Yang
Vladislav Golyanik
Arno Solin
Tat-Jun Chin
Tolga Birdal
MQ
170
2
0
23 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
Filip Szatkowski
Eric Elmoznino
Younesse Kaddar
Simone Scardapane
MoE
60
6
0
06 Oct 2023
A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs
Tianheng Ling
Chao Qian
Lukas Einhaus
Gregor Schiele
31
1
0
04 Oct 2023
MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
T. V. Rozendaal
Tushar Singhal
Hoang Le
Guillaume Sautière
Amir Said
...
Hitarth Mehta
Frank Mayer
Liang Zhang
Markus Nagel
Auke Wiggers
102
11
0
02 Oct 2023
On Calibration of Modern Quantized Efficient Neural Networks
Joe-Hwa Kuang
Alexander Wong
UQCV
MQ
137
1
0
25 Sep 2023
DeepliteRT: Computer Vision at the Edge
Saad Ashfaq
Alexander Hoffman
Saptarshi Mitra
Sudhakar Sah
Mohammadhossein Askarihemmat
Ehsan Saboori
VLM
MQ
105
1
0
19 Sep 2023
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Matteo Grimaldi
Darshan C. Ganji
Ivan Lazarevich
Sudhakar Sah
61
10
0
12 Sep 2023
EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection
Andrej Jovanović
Mario Mihaly
Lennon Donaldson
70
0
0
11 Sep 2023
Softmax Bias Correction for Quantized Generative Models
N. Pandey
Marios Fournarakis
Chirag I. Patel
Markus Nagel
DiffM
68
11
0
04 Sep 2023
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li
Yifan Zhang
Liang Li
Peng Yao
Bo Zhang
Xiangxiang Chu
Yerui Sun
Li-Qiang Du
Yuchen Xie
MQ
108
13
0
30 Aug 2023
ResQ: Residual Quantization for Video Perception
Davide Abati
H. Yahia
Markus Nagel
A. Habibian
MQ
38
2
0
18 Aug 2023
EQ-Net: Elastic Quantization Neural Networks
Ke Xu
Lei Han
Ye Tian
Shangshang Yang
Xingyi Zhang
MQ
124
10
0
15 Aug 2023
Quantization Aware Factorization for Deep Neural Network Compression
Daria Cherniuk
Stanislav Abukhovich
Anh-Huy Phan
Ivan Oseledets
A. Cichocki
Julia Gusak
MQ
74
3
0
08 Aug 2023
Efficient neural supersampling on a novel gaming dataset
Antoine Mercier
Ruan Erasmus
Yash Savani
Manik Dhingra
Fatih Porikli
Guillaume Berger
SupR
66
2
0
03 Aug 2023
Tango: rethinking quantization for graph neural network training on GPUs
Shiyang Chen
Da Zheng
Caiwen Ding
Chengying Huan
Yuede Ji
Hang Liu
GNN
MQ
62
6
0
02 Aug 2023
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
Manasa Manohara
Sankalp Dayal
Tarqi Afzal
Rahul Bakshi
Kahkuen Fu
MQ
52
0
0
01 Aug 2023
An Automata-Theoretic Approach to Synthesizing Binarized Neural Networks
Ye Tao
Wanwei Liu
Fu Song
Zhen Liang
Jing Wang
Hongxu Zhu
55
1
0
29 Jul 2023
Quantized Feature Distillation for Network Quantization
Kevin Zhu
Yin He
Jianxin Wu
MQ
62
11
0
20 Jul 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
49
5
0
10 Jul 2023
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
133
55
0
06 Jul 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
223
98
0
27 Jun 2023
A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware
Shichang Zhang
Atefeh Sohrabizadeh
Cheng Wan
Zijie Huang
Ziniu Hu
Yewen Wang
Yingyan Lin
Lin
Jason Cong
Yizhou Sun
GNN
AI4CE
98
25
0
24 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
123
93
0
22 Jun 2023
Resource Efficient Neural Networks Using Hessian Based Pruning
J. Chong
Manas Gupta
Lihui Chen
59
3
0
12 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
173
587
0
01 Jun 2023
Previous
1
2
3
4
5
6
Next