ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00978
  4. Cited By
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
v1v2v3v4v5 (latest)

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
    EDLMQ
ArXiv (abs)PDFHTML

Papers citing "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"

50 / 425 papers shown
Title
SAM Decoding: Speculative Decoding via Suffix Automaton
SAM Decoding: Speculative Decoding via Suffix Automaton
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
Cuiping Li
Hong Chen
Jing Zhang
162
5
0
16 Nov 2024
TEESlice: Protecting Sensitive Neural Network Models in Trusted
  Execution Environments When Attackers have Pre-Trained Models
TEESlice: Protecting Sensitive Neural Network Models in Trusted Execution Environments When Attackers have Pre-Trained Models
Ding Li
Ziqi Zhang
Mengyu Yao
Y. Cai
Yao Guo
Xiangqun Chen
FedML
75
2
0
15 Nov 2024
ASER: Activation Smoothing and Error Reconstruction for Large Language
  Model Quantization
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
Weibo Zhao
Yubin Shi
Xinyu Lyu
Wanchen Sui
Shen Li
Shen Li
MQ
78
1
0
12 Nov 2024
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models
Nan Xue
Yaping Sun
Zhiyong Chen
Meixia Tao
Xiaodong Xu
Liang Qian
Shuguang Cui
Wenjun Zhang
Ping Zhang
MoE
39
1
0
11 Nov 2024
The Super Weight in Large Language Models
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQMILM
88
13
0
11 Nov 2024
Scaling Laws for Precision
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
Cengiz Pehlevan
Christopher Ré
Aditi Raghunathan
AIFinMoMe
113
29
0
07 Nov 2024
Interactions Across Blocks in Post-Training Quantization of Large
  Language Models
Interactions Across Blocks in Post-Training Quantization of Large Language Models
Khasmamad Shabanovi
Lukas Wiest
Vladimir Golkov
Daniel Cremers
Thomas Pfeil
MQ
75
1
0
06 Nov 2024
The Unreasonable Effectiveness of LLMs for Query Optimization
The Unreasonable Effectiveness of LLMs for Query Optimization
Peter Akioyamen
Zixuan Yi
Ryan Marcus
45
3
0
05 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
166
5
0
05 Nov 2024
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM
  Inference
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Xuanlin Jiang
Yang Zhou
Shiyi Cao
Ion Stoica
Minlan Yu
77
11
0
02 Nov 2024
A Comprehensive Study on Quantization Techniques for Large Language
  Models
A Comprehensive Study on Quantization Techniques for Large Language Models
Jiedong Lang
Zhehao Guo
Shuyu Huang
MQ
100
12
0
30 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
130
0
0
28 Oct 2024
Watermarking Large Language Models and the Generated Content:
  Opportunities and Challenges
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges
Ruisi Zhang
F. Koushanfar
WaLM
91
1
0
24 Oct 2024
CoreInfer: Accelerating Large Language Model Inference with
  Semantics-Inspired Adaptive Sparse Activation
CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation
Qinsi Wang
Saeed Vahidian
Hancheng Ye
Jianyang Gu
Jianyi Zhang
Yiran Chen
38
4
0
23 Oct 2024
Optical Generative Models
Optical Generative Models
Shiqi Chen
Yuhang Li
Hanlong Chen
Aydogan Ozcan
VLM
64
1
0
23 Oct 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for
  Efficient Mixture-of-Experts Inference
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Xin He
Shunkang Zhang
Yuxin Wang
Haiyan Yin
Zihao Zeng
Shaohuai Shi
Zhenheng Tang
Xiaowen Chu
Ivor Tsang
Ong Yew Soon
MoE
102
7
0
23 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
497
0
0
22 Oct 2024
Catastrophic Failure of LLM Unlearning via Quantization
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang
Fali Wang
Xiaomin Li
Zongyu Wu
Xianfeng Tang
Hui Liu
Qi He
Wenpeng Yin
Suhang Wang
MU
113
18
0
21 Oct 2024
Opportunities and Challenges of Generative-AI in Finance
Opportunities and Challenges of Generative-AI in Finance
Akshar Prabhu Desai
Ganesh Satish Mallya
Mohammad Luqman
Tejasvi Ravi
Nithya Kota
Pranjul Yadav
AIFin
127
4
0
21 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
84
14
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM
  Inference with Mixed-Precision and Multi-level Caching
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
112
2
0
17 Oct 2024
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise
  Asymmetric Quantization Configurations
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations
Qian Tao
Wenyuan Yu
Jingren Zhou
MQ
75
4
0
17 Oct 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
94
13
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
127
3
0
16 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
80
1
0
16 Oct 2024
LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate
  through LLMs
LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs
Volker Strobel
Marco Dorigo
Mario Fritz
LRM
114
4
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
58
0
0
14 Oct 2024
KV Prediction for Improved Time to First Token
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
85
4
0
10 Oct 2024
Animating the Past: Reconstruct Trilobite via Video Generation
Animating the Past: Reconstruct Trilobite via Video Generation
Xiaoran Wu
Zien Huang
Chonghan Yu
VGen
97
1
0
10 Oct 2024
CrossQuant: A Post-Training Quantization Method with Smaller
  Quantization Kernel for Precise Large Language Model Compression
CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
Wenyuan Liu
Xindian Ma
Peng Zhang
Yan Wang
MQ
66
1
0
10 Oct 2024
Scaling Laws For Mixed Qquantization
Scaling Laws For Mixed Qquantization
Zeyu Cao
Boyang Gu
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Xitong Gao
Yiren Zhao
MQ
89
1
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
67
7
0
08 Oct 2024
QERA: an Analytical Framework for Quantization Error Reconstruction
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang
Jeffrey T. H. Wong
Can Xiao
George A. Constantinides
Yiren Zhao
MQ
83
4
0
08 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
128
4
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
206
21
0
06 Oct 2024
A Simple yet Effective Training-free Prompt-free Approach to Chinese
  Spelling Correction Based on Large Language Models
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models
Houquan Zhou
Zhenghua Li
Bo Zhang
Chen Li
Shaopeng Lai
Ji Zhang
Fei Huang
Hao Fei
LRM
104
5
0
05 Oct 2024
LoRTA: Low Rank Tensor Adaptation of Large Language Models
LoRTA: Low Rank Tensor Adaptation of Large Language Models
Ignacio Hounie
Charilaos I. Kanatsoulis
Arnuv Tandon
Alejandro Ribeiro
187
0
0
05 Oct 2024
What do Large Language Models Need for Machine Translation Evaluation?
What do Large Language Models Need for Machine Translation Evaluation?
Shenbin Qian
Archchana Sindhujan
Minnie Kabra
Diptesh Kanojia
Constantin Orasan
Tharindu Ranasinghe
Frédéric Blain
ELMLRMALMLM&MA
77
1
0
04 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li
Xinyu Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Zhongchao Shi
Linghe Kong
Yulun Zhang
Xiaokang Yang
MQ
99
8
0
04 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
178
3
0
02 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model
  Compression
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
93
4
0
02 Oct 2024
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV
  Cache Management
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
Yi Xiong
Hao Wu
Changxu Shao
Ziqing Wang
Rui Zhang
Yuhong Guo
Junping Zhao
Ke Zhang
Zhenxuan Pan
82
6
0
01 Oct 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on
  GPU Tensor Cores
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
101
4
0
26 Sep 2024
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned
  Quantization
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
Yifan Tan
Haoze Wang
Chao Yan
Yangdong Deng
MQ
84
2
0
25 Sep 2024
Small Language Models: Survey, Measurements, and Insights
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjDLRM
179
59
0
24 Sep 2024
Deploying Open-Source Large Language Models: A performance Analysis
Deploying Open-Source Large Language Models: A performance Analysis
Yannis Bendi-Ouis
Dan Dutarte
Xavier Hinaut
ALM
57
2
0
23 Sep 2024
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
Xuewen Liu
Zhikai Li
Minhao Jiang
Mengjuan Chen
Jianquan Li
Qingyi Gu
MQ
95
5
0
22 Sep 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with
  Coarse-to-Fine Activation Information
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
79
4
0
20 Sep 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive
  Overview
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
99
3
0
18 Sep 2024
KVPruner: Structural Pruning for Faster and Memory-Efficient Large
  Language Models
KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models
Bo Lv
Quan Zhou
Xuanang Ding
Yan Wang
Zeming Ma
VLM
67
2
0
17 Sep 2024
Previous
123456789
Next