Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.14938
Cited By
Automatic Mixed-Precision Quantization Search of BERT
30 December 2021
Changsheng Zhao
Ting Hua
Yilin Shen
Qian Lou
Hongxia Jin
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automatic Mixed-Precision Quantization Search of BERT"
13 / 13 papers shown
Title
A Hybrid Approach to Information Retrieval and Answer Generation for Regulatory Texts
Jhon Rayo
Raul de la Rosa
Mario Garrido
AILaw
39
0
0
24 Feb 2025
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
Seyed Parsa Neshaei
Yasaman Boreshban
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
MQ
41
0
0
08 Mar 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
41
29
0
05 Feb 2024
SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture Search
S. N. Sridhar
Maciej Szankin
Fang Chen
Sairam Sundaresan
Anthony Sarah
MQ
27
0
0
19 Dec 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng Li
MQ
38
3
0
07 Aug 2023
A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization
Edward Fish
Umberto Michieli
Mete Ozay
MQ
30
4
0
24 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
45
63
0
16 Jul 2023
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
39
0
25 Apr 2023
Numerical Optimizations for Weighted Low-rank Estimation on Language Model
Ting Hua
Yen-Chang Hsu
Felicity Wang
Qiang Lou
Yilin Shen
Hongxia Jin
27
13
0
02 Nov 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
34
635
0
15 Aug 2022
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
236
576
0
12 Sep 2019
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
271
5,330
0
05 Nov 2016
1