Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.01861
Cited By
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
4 June 2022
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers"
24 / 324 papers shown
Title
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
57
0
15 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
32
101
0
15 Feb 2023
Quantized Distributed Training of Large Models with Convergence Guarantees
I. Markov
Adrian Vladu
Qi Guo
Dan Alistarh
MQ
36
11
0
05 Feb 2023
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
48
5
0
30 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
19
19
0
27 Jan 2023
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar
Dan Alistarh
VLM
33
638
0
02 Jan 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
214
0
19 Dec 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
61
741
0
18 Nov 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
33
898
0
31 Oct 2022
Fast DistilBERT on CPUs
Haihao Shen
Ofir Zafrir
Bo Dong
Hengyu Meng
Xinyu. Ye
Zhe Wang
Yi Ding
Hanwen Chang
Guy Boudoukh
Moshe Wasserblat
VLM
29
2
0
27 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
34
0
0
24 Oct 2022
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
29
20
0
20 Oct 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
34
630
0
15 Aug 2022
Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices
Kartik Gupta
Marios Fournarakis
M. Reisser
Christos Louizos
Markus Nagel
FedML
22
15
0
22 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
18
73
0
20 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
42
9
0
22 May 2022
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
M. Lyu
MQ
82
47
0
30 Sep 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
1,996
0
31 Dec 2020
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
ANLIzing the Adversarial Natural Language Inference Dataset
Adina Williams
Tristan Thrush
Douwe Kiela
AAML
179
46
0
24 Oct 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
233
576
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Previous
1
2
3
4
5
6
7