Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00978
Cited By
v1
v2
v3
v4
v5 (latest)
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"
25 / 425 papers shown
Title
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
139
103
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
66
5
0
07 Oct 2023
The Role of Federated Learning in a Wireless World with Foundation Models
Zihan Chen
Howard H. Yang
Y. C. Tay
Kai Fong Ernest Chong
Tony Q.S. Quek
AI4CE
64
8
0
06 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
132
50
0
02 Oct 2023
Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile
Samuel Carreira
Tomás Marques
J. Ribeiro
Carlos Grilo
16
0
0
29 Sep 2023
PB-LLM: Partially Binarized Large Language Models
Yuzhang Shang
Zhihang Yuan
Qiang Wu
Zhen Dong
MQ
102
48
0
29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
88
7
0
29 Sep 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
129
16
0
25 Sep 2023
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
160
25
0
11 Sep 2023
Understanding the Impact of Post-Training Quantization on Large Language Models
Somnath Roy
MQ
90
5
0
11 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo Zhang
Xiangxiang Chu
MQ
111
34
0
06 Sep 2023
QuantEase: Optimization-based Quantization for Language Models
Kayhan Behdin
Ayan Acharya
Aman Gupta
Qingquan Song
Siyu Zhu
S. Keerthi
Rahul Mazumder
MQ
98
21
0
05 Sep 2023
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li
Yifan Zhang
Liang Li
Peng Yao
Bo Zhang
Xiangxiang Chu
Yerui Sun
Li-Qiang Du
Yuchen Xie
MQ
110
13
0
30 Aug 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
MQ
153
206
0
25 Aug 2023
Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
Lequn Chen
Weixin Deng
Anirudh Canumalla
Yu Xin
Danyang Zhuo
Matthai Philipose
Arvind Krishnamurthy
65
3
0
14 Aug 2023
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim
Sihwa Lee
Jangwhan Lee
S. Hong
Duhyeuk Chang
Wonyong Sung
Jungwook Choi
MQ
64
15
0
13 Aug 2023
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Xiaoxia Wu
Z. Yao
Yuxiong He
MQ
82
44
0
19 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
134
6
0
17 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
125
93
0
22 Jun 2023
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
Coleman Hooper
A. Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
MQ
168
198
0
13 Jun 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
136
23
0
27 May 2023
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
78
36
0
18 Apr 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
126
3
0
04 Mar 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
101
26
0
07 Feb 2023
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
109
85
0
20 Jun 2022
Previous
1
2
3
4
5
6
7
8
9