Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.08886
Cited By
v1
v2
v3 (latest)
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
21 November 2018
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HAQ: Hardware-Aware Automated Quantization with Mixed Precision"
50 / 436 papers shown
Title
Compression Aware Certified Training
Changming Xu
Gagandeep Singh
23
0
0
13 Jun 2025
Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs
Jung Hyun Lee
Seungjae Shin
Vinnam Kim
Jaeseong You
An Chen
MQ
28
0
0
10 Jun 2025
Towards a Small Language Model Lifecycle Framework
Parsa Miraghaei
Sergio Moreschini
Antti Kolehmainen
David Hästbacka
13
0
0
09 Jun 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
471
1
0
09 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
97
0
0
08 May 2025
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
143
0
0
08 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
65
0
0
05 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
155
0
0
05 May 2025
BackSlash: Rate Constrained Optimized Training of Large Language Models
Jun Wu
Jiangtao Wen
Yuxing Han
150
1
0
23 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Siyang Song
Brucek Khailany
MQ
97
0
0
19 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
128
1
0
17 Apr 2025
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Yi Hu
Jinhang Zuo
Eddie Zhang
Bob Iannucci
Carlee Joe-Wong
100
0
0
13 Apr 2025
Generative Artificial Intelligence for Internet of Things Computing: A Systematic Survey
Fabrizio Mangione
Claudio Savaglio
Giancarlo Fortino
64
1
0
10 Apr 2025
Hyperflows: Pruning Reveals the Importance of Weights
Eugen Barbulescu
Antonio Alexoaie
60
0
0
06 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zehan Li
Lefei Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
120
1
0
31 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
499
0
0
28 Mar 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
MQ
MoE
125
0
0
27 Mar 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
El-Mehdi El Arar
Silviu-Ioan Filip
Theo Mary
Elisa Riccietti
92
0
0
19 Mar 2025
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng
Shuaiting Li
Zeyu Wang
Kedong Xu
Hong Gu
Kejie Huang
MQ
141
0
0
12 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
102
6
0
08 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
Jiangming Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
99
0
0
07 Mar 2025
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Matteo Risso
Luca Bompani
Daniele Jahier Pagliari
117
0
0
24 Feb 2025
KVCrush: Key value cache size-reduction using similarity in head-behaviour
Gopi Krishna Jha
Sameh Gobriel
Liubov Talamanova
Alexander Kozlov
Nilesh Jain
MQ
76
0
0
24 Feb 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
Boyang Zhang
Daning Cheng
Yunquan Zhang
Meiqi Tu
Fangmin Liu
Jiake Tian
76
1
0
19 Feb 2025
Nearly Lossless Adaptive Bit Switching
Haiduo Huang
Zhenhua Liu
Tian Xia
Wenzhe zhao
Pengju Ren
MQ
101
0
0
03 Feb 2025
Hardware-Aware DNN Compression for Homogeneous Edge Devices
Kunlong Zhang
Guiying Li
Ning Lu
Peng Yang
K. Tang
128
0
0
28 Jan 2025
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Navin Ranjan
Andreas E. Savakis
MQ
88
1
0
10 Jan 2025
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies
Xubin Wang
Weijia Jia
169
2
0
08 Jan 2025
A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli
Shahryar Rahnamayan
Beatrice Ombuki-Berman
MQ
76
1
0
06 Jan 2025
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Taesik Gong
F. Kawsar
Chulhong Min
119
3
0
09 Dec 2024
MPQ-Diff: Mixed Precision Quantization for Diffusion Models
Rocco Manz Maruzzelli
Basile Lewandowski
Lydia Y. Chen
DiffM
MQ
165
0
0
28 Nov 2024
FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!
Yi Ren
Ruge Xu
Xinfei Guo
Weikang Qian
MQ
136
0
0
27 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
151
2
0
24 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
41
0
0
15 Nov 2024
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
82
1
0
03 Nov 2024
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
Yuchen Yang
Shubham Ugare
Yifan Zhao
Gagandeep Singh
Sasa Misailovic
MQ
92
0
0
31 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
482
0
0
29 Oct 2024
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
Wen Liu
Xue Xian Zheng
Jingyi Yu
Xin Lou
MQ
65
0
0
25 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Mark Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
100
3
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
114
3
0
16 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason K. Eshraghian
74
1
0
12 Oct 2024
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
Mohamed Amine Hamdi
Francesco Daghero
G. M. Sarda
Josse Van Delm
Arne Symons
Luca Benini
Marian Verhelst
Daniele Jahier Pagliari
Luca Bompani
53
3
0
11 Oct 2024
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
Yanfeng Jiang
Zelan Yang
B. Chen
Shen Li
Yong Li
Tao Li
MQ
66
3
0
11 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
59
0
0
30 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
101
4
0
26 Sep 2024
UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning
Kathakoli Sengupta
Zhongkai Shagguan
Sandesh Bharadwaj
Sanjay Arora
Eshed Ohn-Bar
Renato Mancuso
159
0
0
17 Sep 2024
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in Healthcare
Zhikai Li
Jing Zhang
Qingyi Gu
MedIm
131
1
0
14 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
56
2
0
14 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
68
1
0
03 Sep 2024
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
84
10
0
15 Aug 2024
1
2
3
4
5
6
7
8
9
Next