Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00978
Cited By
v1
v2
v3
v4
v5 (latest)
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"
50 / 425 papers shown
Title
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments
Qingyu Lu
Liang Ding
Siyi Cao
Xuebo Liu
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
LLMAG
231
0
0
23 May 2025
Towards Practical Defect-Focused Automated Code Review
Junyi Lu
Lili Jiang
Xiaojia Li
Jianbing Fang
Fengjun Zhang
Li Yang
Chun Zuo
218
0
0
23 May 2025
LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation
Fangxin Liu
Ning Yang
Junping Zhao
Tao Yang
Haibing Guan
Li Jiang
MQ
46
0
0
23 May 2025
Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
Roberto Morabito
SiYoung Jang
SyDa
49
0
0
22 May 2025
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs
SiYoung Jang
Roberto Morabito
74
1
0
22 May 2025
LLM-Powered AI Agent Systems and Their Applications in Industry
Guannan Liang
Qianqian Tong
LLMAG
LM&Ro
88
3
0
22 May 2025
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
David Dinucu-Jianu
Jakub Macina
Nico Daheim
Ido Hakimi
Iryna Gurevych
Mrinmaya Sachan
KELM
LRM
110
0
0
21 May 2025
FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization
Fangxin Liu
Zongwu Wang
JinHong Xia
Junping Zhao
Jian Liu
Haibing Guan
Li Jiang
MQ
26
0
0
21 May 2025
Is (Selective) Round-To-Nearest Quantization All You Need?
Alex Kogan
MQ
44
0
0
21 May 2025
Activation-Guided Consensus Merging for Large Language Models
Yuxuan Yao
Shuqi Liu
Zehua Liu
Qintong Li
Mingyang Liu
Xiongwei Han
Zhijiang Guo
Han Wu
Linqi Song
MoMe
147
0
0
20 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability
Qianli Wang
Mingyang Wang
Nils Feldhus
Simon Ostermann
Yuan Cao
Hinrich Schütze
Sebastian Möller
Vera Schmitt
MQ
67
1
0
20 May 2025
FedHQ: Hybrid Runtime Quantization for Federated Learning
Zihao Zheng
Ziyao Wang
Xiuping Cui
Maoliang Li
Jiayu Chen
Liang
Ang Li
Xiang Chen
FedML
MQ
83
0
0
17 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
88
1
0
16 May 2025
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Shihao Zhang
Haoyu Zhang
Ian Colbert
Rayan Saab
MQ
108
0
0
16 May 2025
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks
Chiyue Wei
Bowen Duan
Cong Guo
Jing Zhang
Qingyue Song
Hai "Helen" Li
Yiran Chen
126
0
0
16 May 2025
InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models
Zhen Li
Yupeng Su
Songmiao Wang
Runming Yang
C. Xie
...
Ming Li
Jiannong Cao
Yuan Xie
Ngai Wong
Hongxia Yang
MQ
122
0
0
16 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
101
0
0
12 May 2025
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiasi Chen
Samet Oymak
ReLM
LRM
178
3
0
12 May 2025
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Khurram Mazher
Saad Bin Nasir
MQ
117
0
0
12 May 2025
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Patrick Blumenberg
Thomas Graave
Tim Fingscheidt
MQ
104
0
0
10 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
132
2
0
09 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
473
1
0
09 May 2025
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han
S. Choi
Joo-Young Kim
61
0
0
09 May 2025
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
Nilesh Prasad Pandey
Shriniwas Kulkarni
David Wang
Onat Gungor
Flavio Ponzina
T. Rosing
97
0
0
08 May 2025
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
102
0
0
08 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
166
0
0
07 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
67
0
0
05 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
167
0
0
05 May 2025
Real-time Spatial Retrieval Augmented Generation for Urban Environments
David Nazareno Campo
Javier Conde
Alvaro Alonso
Gabriel Huecas
Joaquín Salvachúa
Pedro Reviriego
69
0
0
04 May 2025
Quantizing Diffusion Models from a Sampling-Aware Perspective
Qian Zeng
Jie Song
Yuanyu Wan
Huiqiong Wang
Mingli Song
DiffM
MQ
122
1
0
04 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Jianchao Tan
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
93
0
0
01 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
158
1
0
01 May 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Ziyi Wang
Steven Li
109
3
0
28 Apr 2025
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
Xilong Xie
Liang Wang
Limin Xiao
Meng Han
Lin Sun
S. Zheng
Xiangrong Xu
MQ
93
0
0
28 Apr 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
132
0
0
28 Apr 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
197
1
0
26 Apr 2025
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
227
0
0
25 Apr 2025
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu
Liyan Chen
Yanning Yang
Haoyu Wang
Dong Du
Zhigang Mao
Naifeng Jing
Yubin Xia
Haibo Chen
70
0
0
24 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
187
0
0
23 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Siyang Song
Brucek Khailany
MQ
103
0
0
19 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
69
1
0
19 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
133
1
0
17 Apr 2025
D
2
^{2}
2
MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Haodong Wang
Qihua Zhou
Zicong Hong
Song Guo
MoE
83
0
0
17 Apr 2025
CSPLADE: Learned Sparse Retrieval with Causal Language Models
Zhichao Xu
Aosong Feng
Yijun Tian
Haibo Ding
Lin Leee Cheong
RALM
107
0
0
15 Apr 2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Tianyi Zhang
Yang Sui
Shaochen Zhong
Vipin Chaudhary
Helen Zhou
Anshumali Shrivastava
MQ
82
2
0
15 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
112
0
0
14 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
98
1
0
14 Apr 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Deyu Cao
Samin Aref
MQ
91
0
0
14 Apr 2025
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
Zongxian Yang
Jiayu Qian
Z. Huang
Kay Chen Tan
LM&MA
LRM
180
0
0
13 Apr 2025
Previous
1
2
3
4
5
6
7
8
9
Next