Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.03696
Cited By
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
29 April 2019
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision"
50 / 120 papers shown
Title
NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution
Stylianos I. Venieris
Mario Almeida
Royson Lee
Nicholas D. Lane
SupR
23
4
0
15 Dec 2022
Towards Hardware-Specific Automatic Compression of Neural Networks
Torben Krieger
Bernhard Klein
Holger Fröning
MQ
27
2
0
15 Dec 2022
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference
Hai Wu
Ruifei He
Hao Hao Tan
Xiaojuan Qi
Kaibin Huang
MQ
24
2
0
10 Dec 2022
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification
Lirui Xiao
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
27
10
0
06 Dec 2022
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
Yijiang Liu
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
31
47
0
29 Nov 2022
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Mingliang Xu
Gongrui Nan
Yuxin Zhang
Rongrong Ji
Rongrong Ji
MQ
18
3
0
12 Nov 2022
Analysis of Quantization on MLP-based Vision Models
Lingran Zhao
Zhen Dong
Kurt Keutzer
MQ
32
7
0
14 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
20
55
0
30 Aug 2022
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
25
80
0
19 Aug 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
25
11
0
11 Aug 2022
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization
Zechao Li
Mengshu Sun
Alec Lu
Haoyu Ma
Geng Yuan
...
Yanyu Li
M. Leeser
Zhangyang Wang
Xue Lin
Zhenman Fang
ViT
MQ
22
50
0
10 Aug 2022
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park
Yeongsang Jang
Eunhyeok Park
MQ
21
2
0
31 Jul 2022
CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution
Chee Hong
Sungyong Baik
Heewon Kim
Seungjun Nah
Kyoung Mu Lee
SupR
MQ
31
32
0
21 Jul 2022
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Shishir G. Patil
Paras Jain
P. Dutta
Ion Stoica
Joseph E. Gonzalez
12
35
0
15 Jul 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
57
95
0
04 Jul 2022
Fast Lossless Neural Compression with Integer-Only Discrete Flows
Siyu Wang
Jianfei Chen
Chongxuan Li
Jun Zhu
Bo Zhang
MQ
19
7
0
17 Jun 2022
SDQ: Stochastic Differentiable Quantization with Mixed Precision
Xijie Huang
Zhiqiang Shen
Shichao Li
Zechun Liu
Xianghong Hu
Jeffry Wicaksana
Eric P. Xing
Kwang-Ting Cheng
MQ
19
33
0
09 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
62
442
0
04 Jun 2022
FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation
Ahmad Shawahna
S. M. Sait
A. El-Maleh
Irfan Ahmad
MQ
20
6
0
22 Mar 2022
Quantization in Layer's Input is Matter
Daning Cheng
Wenguang Chen
MQ
11
0
0
10 Feb 2022
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Yaoqing Yang
Ryan Theisen
Liam Hodgkinson
Joseph E. Gonzalez
Kannan Ramchandran
Charles H. Martin
Michael W. Mahoney
88
17
0
06 Feb 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
24
58
0
01 Feb 2022
Post-training Quantization for Neural Networks with Provable Guarantees
Jinjie Zhang
Yixuan Zhou
Rayan Saab
MQ
23
32
0
26 Jan 2022
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
S. Siddegowda
Marios Fournarakis
Markus Nagel
Tijmen Blankevoort
Chirag I. Patel
Abhijit Khobare
MQ
14
31
0
20 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
41
284
0
14 Jan 2022
BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of DNNs from Scratch
Souvik Kundu
Shikai Wang
Qirui Sun
P. Beerel
Massoud Pedram
MQ
29
18
0
24 Dec 2021
Neural Network Quantization for Efficient Inference: A Survey
Olivia Weng
MQ
28
23
0
08 Dec 2021
Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition
Junhao Xu
Jianwei Yu
Shoukang Hu
Xunying Liu
Helen Meng
MQ
27
13
0
29 Nov 2021
Mixed Precision of Quantization of Transformer Language Models for Speech Recognition
Junhao Xu
Shoukang Hu
Jianwei Yu
Xunying Liu
Helen M. Meng
MQ
40
15
0
29 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
27
24
0
24 Nov 2021
Arch-Net: Model Distillation for Architecture Agnostic Model Deployment
Weixin Xu
Zipeng Feng
Shuangkang Fang
Song Yuan
Yi Yang
Shuchang Zhou
MQ
30
1
0
01 Nov 2021
Whole Brain Segmentation with Full Volume Neural Network
Yeshu Li
Jianwei Cui
Yilun Sheng
Xiao Liang
Jingdong Wang
E. Chang
Yan Xu
32
11
0
29 Oct 2021
Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
Weihan Chen
Peisong Wang
Jian Cheng
MQ
42
62
0
13 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
30
39
0
10 Oct 2021
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
25
133
0
27 Sep 2021
Auto-Split: A General Framework of Collaborative Edge-Cloud AI
Amin Banitalebi-Dehkordi
Naveen Vedula
J. Pei
Fei Xia
Lanjun Wang
Yong Zhang
22
89
0
30 Aug 2021
A White Paper on Neural Network Quantization
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
36
504
0
15 Jun 2021
Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices
Abhishek Ramdas Nair
P. Nath
S. Chakrabartty
Chetan Singh Thakur
48
14
0
03 Jun 2021
HAO: Hardware-aware neural Architecture Optimization for Efficient Inference
Zhen Dong
Yizhao Gao
Qijing Huang
J. Wawrzynek
Hayden Kwok-Hay So
Kurt Keutzer
19
34
0
26 Apr 2021
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
18
47
0
20 Apr 2021
unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation
Stylianos I. Venieris
Javier Fernandez-Marques
Nicholas D. Lane
24
11
0
09 Mar 2021
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
F. Fahim
B. Hawks
C. Herwig
J. Hirschauer
S. Jindariani
...
J. Ngadiuba
Miaoyuan Liu
Duc Hoang
E. Kreinar
Zhenbin Wu
30
129
0
09 Mar 2021
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization
Huanrui Yang
Lin Duan
Yiran Chen
Hai Helen Li
MQ
15
64
0
20 Feb 2021
Dynamic Precision Analog Computing for Neural Networks
Sahaj Garg
Joe Lou
Anirudh Jain
Mitchell Nahmias
45
33
0
12 Feb 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
Hybrid and Non-Uniform quantization methods using retro synthesis data for efficient inference
Gvsl Tej Pratap
R. Kumar
MQ
24
1
0
26 Dec 2020
A Tiny CNN Architecture for Medical Face Mask Detection for Resource-Constrained Endpoints
P. Mohan
A. Paul
Abhay Chirania
CVBM
24
48
0
30 Nov 2020
Bringing AI To Edge: From Deep Learning's Perspective
Di Liu
Hao Kong
Xiangzhong Luo
Weichen Liu
Ravi Subramaniam
52
116
0
25 Nov 2020
BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures
Tianchen Zhao
Xuefei Ning
Xiangsheng Shi
Songyi Yang
Shuang Liang
Peng Lei
Jianfei Chen
Huazhong Yang
Yu Wang
MQ
26
7
0
21 Nov 2020
Previous
1
2
3
Next