Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.03852
Cited By
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
10 November 2019
Zhen Dong
Z. Yao
Yaohui Cai
Daiyaan Arfeen
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks"
50 / 171 papers shown
Title
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
68
0
0
08 May 2025
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Changjun Li
Runqing Jiang
Zhuo Song
Pengpeng Yu
Ye Zhang
Yulan Guo
MQ
56
0
0
01 May 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Shri Kiran Srinivasan
Brucek Khailany
MQ
47
0
0
19 Apr 2025
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Tahmid Hasan Prato
Seijoon Kim
Lizhong Chen
Sanghyun Hong
AAML
38
0
0
02 Apr 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
El-Mehdi El Arar
Silviu-Ioan Filip
Theo Mary
Elisa Riccietti
57
0
0
19 Mar 2025
Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity
Chi Xu
Gefei Zhang
Yantong Zhu
Luca Benini
Guosheng Hu
Yawei Li
Zhihong Zhang
31
0
0
14 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
82
3
0
13 Mar 2025
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng
Shuaiting Li
Zeyu Wang
Kedong Xu
Hong Gu
Kejie Huang
MQ
60
0
0
12 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
179
0
0
10 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
61
1
0
06 Mar 2025
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Matteo Risso
Alessio Burrello
Daniele Jahier Pagliari
46
0
0
24 Feb 2025
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu
David Aponte
Colby R. Banbury
Daniel P. Robinson
Tianyu Ding
K. Koishida
Ilya Zharkov
Tianyi Chen
MQ
70
1
0
23 Feb 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
Boyang Zhang
Daning Cheng
Yunquan Zhang
Meiqi Tu
Fangmin Liu
Jiake Tian
41
1
0
19 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
90
0
0
18 Feb 2025
Nearly Lossless Adaptive Bit Switching
Haiduo Huang
Zhenhua Liu
Tian Xia
Wenzhe zhao
Pengju Ren
MQ
63
0
0
03 Feb 2025
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Navin Ranjan
Andreas E. Savakis
MQ
47
1
0
10 Jan 2025
Pruning-based Data Selection and Network Fusion for Efficient Deep Learning
Humaira Kousar
Hasnain Irshad Bhatti
Jaekyun Moon
37
0
0
03 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
46
0
0
31 Dec 2024
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models
Weilun Feng
Haotong Qin
Chuanguang Yang
Zhulin An
Libo Huang
Boyu Diao
Fei Wang
Renshuai Tao
Yongjun Xu
Michele Magno
DiffM
MQ
80
5
0
16 Dec 2024
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
87
1
0
08 Dec 2024
MPQ-Diff: Mixed Precision Quantization for Diffusion Models
Rocco Manz Maruzzelli
Basile Lewandowski
Lydia Y. Chen
DiffM
MQ
108
0
0
28 Nov 2024
DQRM: Deep Quantized Recommendation Models
Yang Zhou
Zhen Dong
Ellick Chan
Dhiraj Kalamkar
Diana Marculescu
Kurt Keutzer
MQ
50
1
0
26 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Mark Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
28
1
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
35
2
0
16 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs
Tianheng Ling
Chao Qian
Gregor Schiele
26
0
0
04 Oct 2024
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
Yifei Liu
Jicheng Wen
Yang Wang
Shengyu Ye
Li Lyna Zhang
Ting Cao
Cheng Li
Mao Yang
MQ
100
10
0
25 Sep 2024
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
37
0
0
26 Jul 2024
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
27
0
0
05 Jul 2024
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives
Huanrui Yang
Yafeng Huang
Zhen Dong
Denis A. Gudovskiy
Tomoyuki Okuno
Yohei Nakata
Yuan Du
Kurt Keutzer
Shanghang Zhang
MQ
51
0
0
03 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Alessio Burrello
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
MQ
55
2
0
01 Jul 2024
Efficient Neural Compression with Inference-time Decoding
Clément Metz
Olivier Bichler
Antoine Dupret
MQ
27
0
0
10 Jun 2024
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Bei Liu
Haoyu Wang
Yanmin Qian
MQ
36
1
0
08 Jun 2024
P
2
^2
2
-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
48
3
0
30 May 2024
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang
Bei Liu
Hang Shao
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
MQ
31
0
0
27 May 2024
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
Aditya Agrawal
Matthew Hedlund
Blake A. Hechtman
MQ
33
4
0
22 May 2024
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li
Yishuo Cai
Haowei Li
Feng Xue
Zhifeng Li
Yiming Li
MQ
AAML
35
20
0
21 May 2024
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization
Wenxuan Zeng
Tianshi Xu
Meng Li
Runsheng Wang
MQ
38
0
0
15 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
58
48
0
08 Apr 2024
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
Junghyup Lee
Bumsub Ham
32
6
0
28 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
42
20
0
19 Mar 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
28
1
0
27 Feb 2024
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
Ziyi Guan
Hantao Huang
Yupeng Su
Hong Huang
Ngai Wong
Hao Yu
MQ
26
14
0
21 Feb 2024
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
Peijie Dong
Lujun Li
Xinglin Pan
Zimian Wei
Xiang Liu
Qiang-qiang Wang
Xiaowen Chu
66
3
0
03 Feb 2024
HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
Tianshi Xu
Meng Li
Runsheng Wang
45
0
0
29 Jan 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
26
6
0
20 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
24
8
0
03 Jan 2024
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen
H. Vikalo
FedML
MQ
16
7
0
29 Nov 2023
Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning
Jiaqi Li
Yuanhao Lai
Rui Wang
Changjian Shui
Sabyasachi Sahoo
Charles Ling
Shichun Yang
Boyu Wang
Christian Gagné
Fan Zhou
CLL
40
1
0
26 Nov 2023
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
Chenyu Wang
Zhen Dong
Daquan Zhou
Zhenhua Zhu
Yu Wang
Jiashi Feng
Kurt Keutzer
13
2
0
12 Nov 2023
1
2
3
4
Next