ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.08886
  4. Cited By
HAQ: Hardware-Aware Automated Quantization with Mixed Precision

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

21 November 2018
Kuan-Chieh Jackson Wang
Zhijian Liu
Yujun Lin
Ji Lin
Song Han
    MQ
ArXivPDFHTML

Papers citing "HAQ: Hardware-Aware Automated Quantization with Mixed Precision"

50 / 435 papers shown
Title
Tailor: Altering Skip Connections for Resource-Efficient Inference
Tailor: Altering Skip Connections for Resource-Efficient Inference
Olivia Weng
Gabriel Marcano
Vladimir Loncar
Alireza Khodamoradi
Nojan Sheybani
Andres Meza
F. Koushanfar
K. Denolf
Javier Mauricio Duarte
Ryan Kastner
46
12
0
18 Jan 2023
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
X. Chen
Chen Ma
Xue Liu
MQ
30
3
0
24 Dec 2022
Hyperspherical Loss-Aware Ternary Quantization
Hyperspherical Loss-Aware Ternary Quantization
Dan Liu
Xue Liu
MQ
27
0
0
24 Dec 2022
Automatic Network Adaptation for Ultra-Low Uniform-Precision
  Quantization
Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Seongmin Park
Beomseok Kwon
Jieun Lim
Kyuyoung Sim
Taeho Kim
Jungwook Choi
MQ
8
1
0
21 Dec 2022
CSMPQ:Class Separability Based Mixed-Precision Quantization
CSMPQ:Class Separability Based Mixed-Precision Quantization
Ming-Yu Wang
Taisong Jin
Miaohui Zhang
Zhengtao Yu
MQ
31
0
0
20 Dec 2022
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
  Vision Transformers
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
Zhikai Li
Junrui Xiao
Lianwei Yang
Qingyi Gu
MQ
26
82
0
16 Dec 2022
NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device
  Super-Resolution
NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution
Stylianos I. Venieris
Mario Almeida
Royson Lee
Nicholas D. Lane
SupR
23
4
0
15 Dec 2022
Towards Hardware-Specific Automatic Compression of Neural Networks
Towards Hardware-Specific Automatic Compression of Neural Networks
Torben Krieger
Bernhard Klein
Holger Fröning
MQ
27
2
0
15 Dec 2022
PD-Quant: Post-Training Quantization based on Prediction Difference
  Metric
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
96
69
0
14 Dec 2022
Vertical Layering of Quantized Neural Networks for Heterogeneous
  Inference
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference
Hai Wu
Ruifei He
Hao Hao Tan
Xiaojuan Qi
Kaibin Huang
MQ
27
2
0
10 Dec 2022
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level
  Continuous Sparsification
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification
Lirui Xiao
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
27
10
0
06 Dec 2022
Make RepVGG Greater Again: A Quantization-aware Approach
Make RepVGG Greater Again: A Quantization-aware Approach
Xiangxiang Chu
Liang Li
Bo-Wen Zhang
MQ
39
46
0
03 Dec 2022
Boosted Dynamic Neural Networks
Boosted Dynamic Neural Networks
Haichao Yu
Haoxiang Li
G. Hua
Gao Huang
Humphrey Shi
35
7
0
30 Nov 2022
Class-based Quantization for Neural Networks
Class-based Quantization for Neural Networks
Wenhao Sun
Grace Li Zhang
Huaxi Gu
Bing Li
Ulf Schlichtmann
MQ
24
7
0
27 Nov 2022
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision
  Transformer with Heterogeneous Attention
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention
Wenyuan Zeng
Meng Li
Wenjie Xiong
Tong Tong
Wen-jie Lu
Jin Tan
Runsheng Wang
Ru Huang
24
20
0
25 Nov 2022
NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera
  Localization
NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
Shitao Tang
Sicong Tang
Andrea Tagliasacchi
Ping Tan
Yasutaka Furukawa
3DPC
25
17
0
21 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
61
741
0
18 Nov 2022
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion
  Models
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Muyang Li
Ji Lin
Chenlin Meng
Stefano Ermon
Song Han
Jun-Yan Zhu
DiffM
40
45
0
03 Nov 2022
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Shira Guskin
Moshe Wasserblat
Chang Wang
Haihao Shen
MQ
16
2
0
31 Oct 2022
MinUn: Accurate ML Inference on Microcontrollers
MinUn: Accurate ML Inference on Microcontrollers
Shikhar Jaiswal
R. Goli
Aayan Kumar
Vivek Seshadri
Rahul Sharma
26
2
0
29 Oct 2022
Fast DistilBERT on CPUs
Fast DistilBERT on CPUs
Haihao Shen
Ofir Zafrir
Bo Dong
Hengyu Meng
Xinyu. Ye
Zhe Wang
Yi Ding
Hanwen Chang
Guy Boudoukh
Moshe Wasserblat
VLM
29
2
0
27 Oct 2022
Zero-Shot Learning of a Conditional Generative Adversarial Network for
  Data-Free Network Quantization
Zero-Shot Learning of a Conditional Generative Adversarial Network for Data-Free Network Quantization
Yoojin Choi
Mostafa El-Khamy
Jungwon Lee
GAN
24
1
0
26 Oct 2022
Approximating Continuous Convolutions for Deep Network Compression
Approximating Continuous Convolutions for Deep Network Compression
Theo W. Costain
V. Prisacariu
36
0
0
17 Oct 2022
ODG-Q: Robust Quantization via Online Domain Generalization
ODG-Q: Robust Quantization via Online Domain Generalization
Chaofan Tao
Ngai Wong
MQ
39
1
0
17 Oct 2022
FIT: A Metric for Model Sensitivity
FIT: A Metric for Model Sensitivity
Ben Zandonati
Adrian Alan Pol
M. Pierini
Olya Sirkin
Tal Kopetz
MQ
24
8
0
16 Oct 2022
Deep learning model compression using network sensitivity and gradients
Deep learning model compression using network sensitivity and gradients
M. Sakthi
N. Yadla
Raj Pawate
21
2
0
11 Oct 2022
Energy-Efficient Deployment of Machine Learning Workloads on
  Neuromorphic Hardware
Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware
Peyton S. Chandarana
Mohammadreza Mohammadi
J. Seekings
Ramtin Zand
41
6
0
10 Oct 2022
In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile
  Networks
In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks
Kaibin Huang
Hai Wu
Zhiyan Liu
Xiaojuan Qi
13
9
0
07 Oct 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
Human Activity Recognition on Microcontrollers with Quantized and
  Adaptive Deep Neural Networks
Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks
Francesco Daghero
Alessio Burrello
Chen Xie
Marco Castellano
Luca Gandolfi
A. Calimera
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
BDL
HAI
21
19
0
02 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
  Network Quantization
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
20
55
0
30 Aug 2022
SONAR: Joint Architecture and System Optimization Search
SONAR: Joint Architecture and System Optimization Search
Elias Jääsaari
Michelle Ma
Ameet Talwalkar
Tianqi Chen
36
1
0
25 Aug 2022
Optimal Brain Compression: A Framework for Accurate Post-Training
  Quantization and Pruning
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Elias Frantar
Sidak Pal Singh
Dan Alistarh
MQ
28
217
0
24 Aug 2022
Design Automation for Fast, Lightweight, and Effective Deep Learning
  Models: A Survey
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey
Dalin Zhang
Kaixuan Chen
Yan Zhao
B. Yang
Li-Ping Yao
Christian S. Jensen
48
3
0
22 Aug 2022
Combining Gradients and Probabilities for Heterogeneous Approximation of
  Neural Networks
Combining Gradients and Probabilities for Heterogeneous Approximation of Neural Networks
E. Trommer
Bernd Waschneck
Akash Kumar
20
6
0
15 Aug 2022
Mixed-Precision Neural Networks: A Survey
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
25
11
0
11 Aug 2022
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision
  Transformer with Mixed-Scheme Quantization
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization
Zechao Li
Mengshu Sun
Alec Lu
Haoyu Ma
Geng Yuan
...
Yanyu Li
M. Leeser
Zhangyang Wang
Xue Lin
Zhenman Fang
ViT
MQ
22
50
0
10 Aug 2022
Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA
Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA
Cecilia Latotzke
Tim Ciesielski
T. Gemmeke
MQ
13
7
0
09 Aug 2022
Quantized Sparse Weight Decomposition for Neural Network Compression
Quantized Sparse Weight Decomposition for Neural Network Compression
Andrey Kuzmin
M. V. Baalen
Markus Nagel
Arash Behboodi
MQ
14
3
0
22 Jul 2022
CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution
CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution
Chee Hong
Sungyong Baik
Heewon Kim
Seungjun Nah
Kyoung Mu Lee
SupR
MQ
31
32
0
21 Jul 2022
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A
  Meta-Learning Approach
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Jiseok Youn
Jaehun Song
Hyung-Sin Kim
S. Bahk
MQ
22
8
0
20 Jul 2022
Mixed-Precision Inference Quantization: Radically Towards Faster
  inference speed, Lower Storage requirement, and Lower Loss
Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss
Daning Cheng
Wenguang Chen
MQ
24
0
0
20 Jul 2022
Learnable Mixed-precision and Dimension Reduction Co-design for
  Low-storage Activation
Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation
Yu-Shan Tai
Cheng-Yang Chang
Chieh-Fang Teng
AnYeu
A. Wu
30
5
0
16 Jul 2022
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
Liwei Guo
Wonkyo Choe
F. Lin
24
14
0
11 Jul 2022
Dynamic Spatial Sparsification for Efficient Vision Transformers and
  Convolutional Neural Networks
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks
Yongming Rao
Zuyan Liu
Wenliang Zhao
Jie Zhou
Jiwen Lu
ViT
44
36
0
04 Jul 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer
  Inference
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
57
95
0
04 Jul 2022
On-Device Training Under 256KB Memory
On-Device Training Under 256KB Memory
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Chuang Gan
Song Han
MQ
30
197
0
30 Jun 2022
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model
  Co-Exploration
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model Co-Exploration
A. Inci
Siri Garudanagiri Virupaksha
Aman Jain
Ting-Wu Chin
Venkata Vivek Thallam
Ruizhou Ding
Diana Marculescu
MQ
23
3
0
30 Jun 2022
Computational Complexity Evaluation of Neural Network Applications in
  Signal Processing
Computational Complexity Evaluation of Neural Network Applications in Signal Processing
Pedro J. Freire
S. Srivallapanondh
A. Napoli
Jaroslaw E. Prilepsky
S. Turitsyn
37
1
0
24 Jun 2022
Channel-wise Mixed-precision Assignment for DNN Inference on Constrained
  Edge Nodes
Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes
Matteo Risso
Alessio Burrello
Luca Benini
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
MQ
18
11
0
17 Jun 2022
Previous
123456789
Next