Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.06822
Cited By
Low-bit Quantization of Neural Networks for Efficient Inference
18 February 2019
Yoni Choukroun
Eli Kravchik
Fan Yang
P. Kisilev
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Low-bit Quantization of Neural Networks for Efficient Inference"
50 / 182 papers shown
Title
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu
Wenshuo Li
Yehui Tang
Yiman Zhang
Yihao Chen
Houqiang Li
Yunhe Wang
Xinghao Chen
VLM
44
19
0
21 Dec 2023
ARBiBench: Benchmarking Adversarial Robustness of Binarized Neural Networks
Peng Zhao
Jiehua Zhang
Bowen Peng
Longguang Wang
Yingmei Wei
Yu Liu
Li Liu
AAML
29
0
0
21 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
36
13
0
13 Dec 2023
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review
M. Lê
Pierre Wolinski
Julyan Arbel
34
8
0
20 Nov 2023
Exploring Post-Training Quantization of Protein Language Models
Shuang Peng
Fei Yang
Ning Sun
Sheng Chen
Yanfeng Jiang
Aimin Pan
MQ
27
0
0
30 Oct 2023
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Shih-yang Liu
Zechun Liu
Xijie Huang
Pingcheng Dong
Kwang-Ting Cheng
MQ
19
56
0
25 Oct 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
28
51
0
12 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
22
5
0
07 Oct 2023
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Yichen Xie
Wei Le
MQ
24
4
0
29 Sep 2023
Efficient Post-training Quantization with FP8 Formats
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
23
19
0
26 Sep 2023
Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design
Chao Fang
Wei Sun
Aojun Zhou
Zhongfeng Wang
16
3
0
22 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
22
0
0
20 Sep 2023
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
24
24
0
22 Aug 2023
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
MQ
41
16
0
21 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MQ
37
19
0
16 Aug 2023
Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning
Shipeng Bai
Jun Chen
Xintian Shen
Yixuan Qian
Yong Liu
MQ
24
12
0
14 Aug 2023
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
27
48
0
06 Jul 2023
Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning
Jun Chen
Shipeng Bai
Tianxin Huang
Mengmeng Wang
Guanzhong Tian
Y. Liu
MQ
38
18
0
02 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
18
88
0
22 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
30
1
0
07 Jun 2023
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
Zhuocheng Gong
Jiahao Liu
Qifan Wang
Yang Yang
Jingang Wang
Wei Wu
Yunsen Xian
Dongyan Zhao
Rui Yan
MQ
33
5
0
30 May 2023
Towards Accurate Post-training Quantization for Diffusion Models
Changyuan Wang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
MQ
30
21
0
30 May 2023
Post-training Model Quantization Using GANs for Synthetic Data Generation
Athanasios Masouris
Mansi Sharma
Adrian Boguszewski
Alexander Kozlov
Zhuo Wu
Raymond Lo
MQ
19
0
0
10 May 2023
Adaptive Scheduling for Edge-Assisted DNN Serving
Jian He
Chen-Shun Yang
Zhaoyuan He
Ghufran Baig
L. Qiu
19
0
0
19 Apr 2023
Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric
Lin Niu
Jia-Wen Liu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
33
2
0
19 Apr 2023
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
27
31
0
18 Apr 2023
EcoFed: Efficient Communication for DNN Partitioning-based Federated Learning
Di Wu
R. Ullah
Philip Rodgers
Peter Kilpatrick
I. Spence
Blesson Varghese
FedML
32
1
0
11 Apr 2023
Towards Accurate Post-Training Quantization for Vision Transformer
Yifu Ding
Haotong Qin
Qing-Yu Yan
Z. Chai
Junjie Liu
Xiaolin K. Wei
Xianglong Liu
MQ
54
68
0
25 Mar 2023
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Zhihang Yuan
Jiawei Liu
Jiaxiang Wu
Dawei Yang
Qiang Wu
Guangyu Sun
Wenyu Liu
Xinggang Wang
Bingzhe Wu
MQ
22
6
0
23 Mar 2023
Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems
Jemin Lee
Yongin Kwon
Sihyeong Park
Misun Yu
Jeman Park
Hwanjun Song
ViT
MQ
19
5
0
22 Mar 2023
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
23
0
0
03 Mar 2023
BiBench: Benchmarking and Analyzing Network Binarization
Haotong Qin
Mingyuan Zhang
Yifu Ding
Aoyu Li
Zhongang Cai
Ziwei Liu
Feng Yu
Xianglong Liu
MQ
AAML
34
36
0
26 Jan 2023
PowerQuant: Automorphism Search for Non-Uniform Quantization
Edouard Yvinec
Arnaud Dapogny
Matthieu Cord
Kévin Bailly
MQ
20
15
0
24 Jan 2023
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
96
68
0
14 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
29
6
0
05 Dec 2022
Post-training Quantization on Diffusion Models
Yuzhang Shang
Zhihang Yuan
Bin Xie
Bingzhe Wu
Yan Yan
DiffM
MQ
15
159
0
28 Nov 2022
CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
ViT
MQ
29
1
0
17 Nov 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks
Louis Leconte
S. Schechtman
Eric Moulines
29
4
0
07 Nov 2022
TPU-MLIR: A Compiler For TPU Using MLIR
Pengchao Hu
Man Lu
Lei Wang
Guoyue Jiang
14
5
0
23 Oct 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
36
145
0
27 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViT
MQ
43
33
0
13 Sep 2022
A simple approach for quantizing neural networks
J. Maly
Rayan Saab
MQ
22
11
0
07 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
20
55
0
30 Aug 2022
Efficient Adaptive Activation Rounding for Post-Training Quantization
Zhengyi Li
Cong Guo
Zhanda Zhu
Yangjie Zhou
Yuxian Qiu
Xiaotian Gao
Jingwen Leng
Minyi Guo
MQ
30
3
0
25 Aug 2022
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Elias Frantar
Sidak Pal Singh
Dan Alistarh
MQ
25
216
0
24 Aug 2022
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
25
78
0
19 Aug 2022
AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets
Zhaopeng Tu
Xinghao Chen
Pengju Ren
Yunhe Wang
MQ
36
54
0
17 Aug 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
25
11
0
11 Aug 2022
Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Jiseok Youn
Jaehun Song
Hyung-Sin Kim
S. Bahk
MQ
13
8
0
20 Jul 2022
MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using DRAM Technology
Minki Jeong
Wanyeong Jung
9
0
0
16 Jul 2022
Previous
1
2
3
4
Next