ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00978
  4. Cited By
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
v1v2v3v4v5 (latest)

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
    EDLMQ
ArXiv (abs)PDFHTML

Papers citing "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"

50 / 425 papers shown
Title
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit
  Large Language Models
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
86
8
0
28 May 2024
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang
Bei Liu
Hang Shao
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
MQ
55
1
0
27 May 2024
Accelerating Inference of Retrieval-Augmented Generation via Sparse
  Context Selection
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
Yun Zhu
Jia-Chen Gu
Caitlin Sikora
Ho Ko
Yinxiao Liu
...
Lei Shu
Liangchen Luo
Lei Meng
Bang Liu
Jindong Chen
RALM
99
19
0
25 May 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu
Haoxuan Wang
Yuzhang Shang
Mubarak Shah
Yan Yan
MQ
115
24
0
25 May 2024
Athena: Efficient Block-Wise Post-Training Quantization for Large
  Language Models Using Second-Order Matrix Derivative Information
Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information
Yanshu Wang
Wenyang He
Tong Yang
MQ
27
1
0
24 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
95
15
0
23 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language
  Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
101
37
0
23 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
144
19
0
23 May 2024
TerDiT: Ternary Diffusion Models with Transformers
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
124
3
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
174
0
0
23 May 2024
QGait: Toward Accurate Quantization for Gait Recognition with Binarized
  Input
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input
Senmao Tian
Haoyu Gao
Gangyi Hong
Shuyun Wang
JingJie Wang
Xin Yu
Shunli Zhang
MQ
63
1
0
22 May 2024
Class-Conditional self-reward mechanism for improved Text-to-Image
  models
Class-Conditional self-reward mechanism for improved Text-to-Image models
Safouane El Ghazouali
Arnaud Gucciardi
Umberto Michelucci
EGVM
46
0
0
22 May 2024
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization
  Method for LLMs
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
Alireza Ghaffari
Sharareh Younesian
Vahid Partovi Nia
Boxing Chen
M. Asgharian
MQ
75
0
0
22 May 2024
ReALLM: A general framework for LLM compression and fine-tuning
ReALLM: A general framework for LLM compression and fine-tuning
Louis Leconte
Lisa Bedin
Van Minh Nguyen
Eric Moulines
MQ
129
1
0
21 May 2024
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache
  Generation
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho
Mohammad Rastegari
Devang Naik
85
4
0
08 May 2024
Critical Infrastructure Protection: Generative AI, Challenges, and
  Opportunities
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities
Yagmur Yigit
M. Ferrag
Iqbal H. Sarker
Leandros A. Maglaras
Christos Chrysoulas
Naghmeh Moradpoor
Helge Janicke
65
8
0
08 May 2024
PTQ4SAM: Post-Training Quantization for Segment Anything
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv
Hong Chen
Jinyang Guo
Yifu Ding
Xianglong Liu
VLMMQ
85
16
0
06 May 2024
PatentGPT: A Large Language Model for Intellectual Property
PatentGPT: A Large Language Model for Intellectual Property
Zilong Bai
Ruiji Zhang
Linqing Chen
Qijun Cai
Yuan Zhong
...
Fu Bian
Xiaolong Gu
Lisha Zhang
Weilei Wang
Changyang Tu
105
5
0
28 Apr 2024
How to Parameterize Asymmetric Quantization Ranges for
  Quantization-Aware Training
How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You
Minseop Park
Kyunggeun Lee
Seokjun An
Chirag I. Patel
Markus Nagel
MQ
70
2
0
25 Apr 2024
Towards Socially and Environmentally Responsible AI
Towards Socially and Environmentally Responsible AI
Pengfei Li
Yejia Liu
Jianyi Yang
Shaolei Ren
81
0
0
23 Apr 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
120
18
0
22 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
176
99
0
22 Apr 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
152
42
0
22 Apr 2024
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via
  decoupling Parameters into Integer and Floating Points
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points
Yi Guo
Fanliu Kong
Xiaoyang Li
Hui Li
Wei Chen
Xiaogang Tian
Jinping Cai
Yang Zhang
Shouda Liu
MQ
66
6
0
19 Apr 2024
Accelerating Inference in Large Language Models with a Unified Layer
  Skipping Strategy
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
Yijin Liu
Fandong Meng
Jie Zhou
AI4CE
81
9
0
10 Apr 2024
Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of
  Large Language Models
Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models
Zihan Fang
Zheng Lin
Zhe Chen
Xianhao Chen
Yue Gao
Yuguang Fang
98
38
0
09 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of
  Mixture-of-Experts Language Models
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
100
22
0
08 Apr 2024
Lossless and Near-Lossless Compression for Foundation Models
Lossless and Near-Lossless Compression for Foundation Models
Moshik Hershcovitch
Leshem Choshen
Andrew Wood
Ilias Enmouri
Peter Chin
S. Sundararaman
Danny Harnik
92
6
0
05 Apr 2024
Cherry on Top: Parameter Heterogeneity and Quantization in Large
  Language Models
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Wanyun Cui
Qianle Wang
MQ
76
3
0
03 Apr 2024
Octopus v2: On-device language model for super agent
Octopus v2: On-device language model for super agent
Wei Chen
Zhiyuan Li
RALM
53
27
0
02 Apr 2024
Transformer-Lite: High-efficiency Deployment of Large Language Models on
  Mobile Phone GPUs
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li
Sheng Qian
Jie Lu
Lunxi Yuan
Rui Wang
Qin Xie
102
10
0
29 Mar 2024
The Need for Speed: Pruning Transformers with One Recipe
The Need for Speed: Pruning Transformers with One Recipe
Samir Khaki
Konstantinos N. Plataniotis
93
10
0
26 Mar 2024
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV
  Caching
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
Youpeng Zhao
Di Wu
Jun Wang
96
28
0
26 Mar 2024
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao
Min Zhang
Wei Zhao
Pengxiang Ding
Siteng Huang
Donglin Wang
Mamba
138
74
0
21 Mar 2024
AI and Memory Wall
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
84
163
0
21 Mar 2024
AffineQuant: Affine Transformation Quantization for Large Language
  Models
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Yong Li
Rongrong Ji
MQ
140
28
0
19 Mar 2024
RouterBench: A Benchmark for Multi-LLM Routing System
RouterBench: A Benchmark for Multi-LLM Routing System
Qitian Jason Hu
Jacob Bieker
Xiuyu Li
Nan Jiang
Benjamin Keigwin
Gaurav Ranganath
Kurt Keutzer
Shriyash Kaustubh Upadhyay
115
54
0
18 Mar 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using
  Heterogeneous Pipelines
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
96
35
0
18 Mar 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
  LLMs Under Compression
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong
Jinhao Duan
Chenhui Zhang
Zhangheng Li
Chulin Xie
...
B. Kailkhura
Dan Hendrycks
Dawn Song
Zhangyang Wang
Yue Liu
112
28
0
18 Mar 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
167
64
0
12 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
89
1
0
11 Mar 2024
What Makes Quantization for Large Language Models Hard? An Empirical
  Study from the Lens of Perturbation
What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation
Zhuocheng Gong
Jiahao Liu
Jingang Wang
Xunliang Cai
Dongyan Zhao
Rui Yan
MQ
57
10
0
11 Mar 2024
QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ: Quality Adaptive Quantization for LLM KV Cache
Shichen Dong
Wenfang Cheng
Jiayu Qin
Wei Wang
MQ
118
36
0
07 Mar 2024
Found in the Middle: How Language Models Use Long Contexts Better via
  Plug-and-Play Positional Encoding
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
Zhenyu Zhang
Runjin Chen
Shiwei Liu
Zhewei Yao
Olatunji Ruwase
Beidi Chen
Xiaoxia Wu
Zhangyang Wang
95
36
0
05 Mar 2024
On the Compressibility of Quantized Large Language Models
On the Compressibility of Quantized Large Language Models
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
83
6
0
03 Mar 2024
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition
  and Adaptive Quantization
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Juntao Zhao
Borui Wan
Size Zheng
Yanghua Peng
Chuan Wu
MQ
68
15
0
02 Mar 2024
CLLMs: Consistency Large Language Models
CLLMs: Consistency Large Language Models
Siqi Kou
Lanxiang Hu
Zhe He
Zhijie Deng
Hao Zhang
147
34
0
28 Feb 2024
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware
  Mixed Precision Quantization
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
J. Yang
Byeongwook Kim
Jeongin Bae
Beomseok Kwon
Gunho Park
Eunho Yang
S. Kwon
Dongsoo Lee
MQ
178
53
0
28 Feb 2024
OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models
  in Medicine
OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine
Xiaosong Wang
Xiaofan Zhang
Guotai Wang
Junjun He
Zhongyu Li
...
Jie Zhao
Kang Li
Xin Sun
Lifeng Zhu
Shaoting Zhang
LM&MAVLMMedIm
108
8
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
301
22
0
28 Feb 2024
Previous
123456789
Next