Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.05079
Cited By
v1
v2 (latest)
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
8 October 2023
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
George A. Constantinides
Yiren Zhao
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"
13 / 13 papers shown
Title
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
103
662
0
15 Aug 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
127
479
0
04 Jun 2022
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
288
2,521
0
20 Apr 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
86
72
0
08 Feb 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
214
227
0
31 Dec 2020
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
H. Habi
Roy H. Jennings
Arnon Netzer
MQ
68
65
0
20 Jul 2020
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Bichen Wu
Yanghan Wang
Peizhao Zhang
Yuandong Tian
Peter Vajda
Kurt Keutzer
MQ
74
273
0
30 Nov 2018
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
129
884
0
21 Nov 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
63
703
0
26 Jul 2018
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi
MQ
141
1,021
0
21 Jun 2018
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
242
1,413
0
31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,196
0
20 Apr 2018
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
338
2,898
0
26 Sep 2016
1