Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00978
Cited By
v1
v2
v3
v4
v5 (latest)
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"
50 / 425 papers shown
Title
SAM Decoding: Speculative Decoding via Suffix Automaton
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
Cuiping Li
Hong Chen
Jing Zhang
162
5
0
16 Nov 2024
TEESlice: Protecting Sensitive Neural Network Models in Trusted Execution Environments When Attackers have Pre-Trained Models
Ding Li
Ziqi Zhang
Mengyu Yao
Y. Cai
Yao Guo
Xiangqun Chen
FedML
75
2
0
15 Nov 2024
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
Weibo Zhao
Yubin Shi
Xinyu Lyu
Wanchen Sui
Shen Li
Shen Li
MQ
78
1
0
12 Nov 2024
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models
Nan Xue
Yaping Sun
Zhiyong Chen
Meixia Tao
Xiaodong Xu
Liang Qian
Shuguang Cui
Wenjun Zhang
Ping Zhang
MoE
39
1
0
11 Nov 2024
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
88
13
0
11 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
Cengiz Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
113
29
0
07 Nov 2024
Interactions Across Blocks in Post-Training Quantization of Large Language Models
Khasmamad Shabanovi
Lukas Wiest
Vladimir Golkov
Daniel Cremers
Thomas Pfeil
MQ
75
1
0
06 Nov 2024
The Unreasonable Effectiveness of LLMs for Query Optimization
Peter Akioyamen
Zixuan Yi
Ryan Marcus
45
3
0
05 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
166
5
0
05 Nov 2024
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Xuanlin Jiang
Yang Zhou
Shiyi Cao
Ion Stoica
Minlan Yu
77
11
0
02 Nov 2024
A Comprehensive Study on Quantization Techniques for Large Language Models
Jiedong Lang
Zhehao Guo
Shuyu Huang
MQ
100
12
0
30 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
130
0
0
28 Oct 2024
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges
Ruisi Zhang
F. Koushanfar
WaLM
91
1
0
24 Oct 2024
CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation
Qinsi Wang
Saeed Vahidian
Hancheng Ye
Jianyang Gu
Jianyi Zhang
Yiran Chen
38
4
0
23 Oct 2024
Optical Generative Models
Shiqi Chen
Yuhang Li
Hanlong Chen
Aydogan Ozcan
VLM
64
1
0
23 Oct 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Xin He
Shunkang Zhang
Yuxin Wang
Haiyan Yin
Zihao Zeng
Shaohuai Shi
Zhenheng Tang
Xiaowen Chu
Ivor Tsang
Ong Yew Soon
MoE
102
7
0
23 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
497
0
0
22 Oct 2024
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang
Fali Wang
Xiaomin Li
Zongyu Wu
Xianfeng Tang
Hui Liu
Qi He
Wenpeng Yin
Suhang Wang
MU
113
18
0
21 Oct 2024
Opportunities and Challenges of Generative-AI in Finance
Akshar Prabhu Desai
Ganesh Satish Mallya
Mohammad Luqman
Tejasvi Ravi
Nithya Kota
Pranjul Yadav
AIFin
127
4
0
21 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
84
14
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
112
2
0
17 Oct 2024
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations
Qian Tao
Wenyuan Yu
Jingren Zhou
MQ
75
4
0
17 Oct 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
94
13
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
127
3
0
16 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
80
1
0
16 Oct 2024
LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs
Volker Strobel
Marco Dorigo
Mario Fritz
LRM
114
4
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
58
0
0
14 Oct 2024
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
85
4
0
10 Oct 2024
Animating the Past: Reconstruct Trilobite via Video Generation
Xiaoran Wu
Zien Huang
Chonghan Yu
VGen
97
1
0
10 Oct 2024
CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
Wenyuan Liu
Xindian Ma
Peng Zhang
Yan Wang
MQ
66
1
0
10 Oct 2024
Scaling Laws For Mixed Qquantization
Zeyu Cao
Boyang Gu
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Xitong Gao
Yiren Zhao
MQ
89
1
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
67
7
0
08 Oct 2024
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang
Jeffrey T. H. Wong
Can Xiao
George A. Constantinides
Yiren Zhao
MQ
83
4
0
08 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
128
4
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
206
21
0
06 Oct 2024
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models
Houquan Zhou
Zhenghua Li
Bo Zhang
Chen Li
Shaopeng Lai
Ji Zhang
Fei Huang
Hao Fei
LRM
104
5
0
05 Oct 2024
LoRTA: Low Rank Tensor Adaptation of Large Language Models
Ignacio Hounie
Charilaos I. Kanatsoulis
Arnuv Tandon
Alejandro Ribeiro
187
0
0
05 Oct 2024
What do Large Language Models Need for Machine Translation Evaluation?
Shenbin Qian
Archchana Sindhujan
Minnie Kabra
Diptesh Kanojia
Constantin Orasan
Tharindu Ranasinghe
Frédéric Blain
ELM
LRM
ALM
LM&MA
77
1
0
04 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li
Xinyu Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Zhongchao Shi
Linghe Kong
Yulun Zhang
Xiaokang Yang
MQ
99
8
0
04 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
178
3
0
02 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
93
4
0
02 Oct 2024
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
Yi Xiong
Hao Wu
Changxu Shao
Ziqing Wang
Rui Zhang
Yuhong Guo
Junping Zhao
Ke Zhang
Zhenxuan Pan
82
6
0
01 Oct 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
101
4
0
26 Sep 2024
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
Yifan Tan
Haoze Wang
Chao Yan
Yangdong Deng
MQ
84
2
0
25 Sep 2024
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
179
59
0
24 Sep 2024
Deploying Open-Source Large Language Models: A performance Analysis
Yannis Bendi-Ouis
Dan Dutarte
Xavier Hinaut
ALM
57
2
0
23 Sep 2024
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
Xuewen Liu
Zhikai Li
Minhao Jiang
Mengjuan Chen
Jianquan Li
Qingyi Gu
MQ
95
5
0
22 Sep 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
79
4
0
20 Sep 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
99
3
0
18 Sep 2024
KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models
Bo Lv
Quan Zhou
Xuanang Ding
Yan Wang
Zeming Ma
VLM
67
2
0
17 Sep 2024
Previous
1
2
3
4
5
6
7
8
9
Next