Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.10438
Cited By
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
18 November 2022
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models"
34 / 534 papers shown
Title
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Yijia Zhang
Lingran Zhao
Shijie Cao
Wenqiang Wang
Ting Cao
Fan Yang
Mao Yang
Shanghang Zhang
Ningyi Xu
MQ
29
17
0
21 May 2023
Standalone 16-bit Neural Network Training: Missing Study for Hardware-Limited Deep Learning Practitioners
Juyoung Yun
Byungkon Kang
Francois Rameau
Zhoulai Fu
Zhoulai Fu
MQ
18
1
0
18 May 2023
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Zhaozhuo Xu
Zirui Liu
Beidi Chen
Yuxin Tang
Jue Wang
Kaixiong Zhou
Xia Hu
Anshumali Shrivastava
MQ
32
29
0
17 May 2023
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Guangxuan Xiao
Tianwei Yin
William T. Freeman
F. Durand
Song Han
VGen
DiffM
56
239
0
17 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
65
122
0
16 May 2023
Fast Distributed Inference Serving for Large Language Models
Bingyang Wu
Yinmin Zhong
Zili Zhang
Gang Huang
Xuanzhe Liu
Xin Jin
40
93
0
10 May 2023
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Lingjiao Chen
Matei A. Zaharia
James Zou
LLMAG
32
212
0
09 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation
J. Heo
S. Azizi
A. Fayyazi
Massoud Pedram
28
0
0
08 May 2023
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
39
0
25 Apr 2023
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Zhihang Yuan
Lin Niu
Jia-Wen Liu
Wenyu Liu
Xinggang Wang
Yuzhang Shang
Guangyu Sun
Qiang Wu
Jiaxiang Wu
Bingzhe Wu
MQ
35
79
0
03 Apr 2023
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
31
45
0
31 Mar 2023
When Brain-inspired AI Meets AGI
Lin Zhao
Lu Zhang
Zihao Wu
Yuzhong Chen
Haixing Dai
...
Xi Jiang
Xiang Li
Dajiang Zhu
Dinggang Shen
Tianming Liu
AI4CE
32
89
0
28 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
24
7
0
20 Mar 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
57
0
15 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
371
0
13 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
44
3
0
04 Mar 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Jason Eshraghian
BDL
VLM
33
84
0
27 Feb 2023
With Shared Microexponents, A Little Shifting Goes a Long Way
Bita Darvish Rouhani
Ritchie Zhao
V. Elango
Rasoul Shafipour
Mathew Hall
...
Eric S. Chung
Zhaoxia Deng
S. Naghshineh
Jongsoo Park
Maxim Naumov
MQ
43
38
0
16 Feb 2023
Offsite-Tuning: Transfer Learning without Full Model
Guangxuan Xiao
Ji Lin
Song Han
43
67
0
09 Feb 2023
Quantized Distributed Training of Large Models with Convergence Guarantees
I. Markov
Adrian Vladu
Qi Guo
Dan Alistarh
MQ
36
11
0
05 Feb 2023
Oscillation-free Quantization for Low-bit Vision Transformers
Shi Liu
Zechun Liu
Kwang-Ting Cheng
MQ
26
34
0
04 Feb 2023
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
Juyoung Yun
Byungkon Kang
Zhoulai Fu
MQ
26
1
0
30 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
19
19
0
27 Jan 2023
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar
Dan Alistarh
VLM
35
638
0
02 Jan 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
218
0
19 Dec 2022
LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models
Simin Chen
Cong Liu
Mirazul Haque
Wei Yang
34
21
0
07 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
275
1,077
0
05 Oct 2022
Efficient Adaptive Activation Rounding for Post-Training Quantization
Zhengyi Li
Cong Guo
Zhanda Zhu
Yangjie Zhou
Yuxian Qiu
Xiaotian Gao
Jingwen Leng
Minyi Guo
MQ
32
4
0
25 Aug 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
21
74
0
20 Jun 2022
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
344
0
05 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
236
576
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,996
0
20 Apr 2018
Previous
1
2
3
...
10
11
9