ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00978
  4. Cited By
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
v1v2v3v4v5 (latest)

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
    EDLMQ
ArXiv (abs)PDFHTML

Papers citing "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"

50 / 425 papers shown
Title
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments
Qingyu Lu
Liang Ding
Siyi Cao
Xuebo Liu
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
LLMAG
231
0
0
23 May 2025
Towards Practical Defect-Focused Automated Code Review
Towards Practical Defect-Focused Automated Code Review
Junyi Lu
Lili Jiang
Xiaojia Li
Jianbing Fang
Fengjun Zhang
Li Yang
Chun Zuo
218
0
0
23 May 2025
LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation
LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation
Fangxin Liu
Ning Yang
Junping Zhao
Tao Yang
Haibing Guan
Li Jiang
MQ
46
0
0
23 May 2025
Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
Roberto Morabito
SiYoung Jang
SyDa
49
0
0
22 May 2025
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs
SiYoung Jang
Roberto Morabito
74
1
0
22 May 2025
LLM-Powered AI Agent Systems and Their Applications in Industry
LLM-Powered AI Agent Systems and Their Applications in Industry
Guannan Liang
Qianqian Tong
LLMAGLM&Ro
88
3
0
22 May 2025
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
David Dinucu-Jianu
Jakub Macina
Nico Daheim
Ido Hakimi
Iryna Gurevych
Mrinmaya Sachan
KELMLRM
110
0
0
21 May 2025
FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization
FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization
Fangxin Liu
Zongwu Wang
JinHong Xia
Junping Zhao
Jian Liu
Haibing Guan
Li Jiang
MQ
26
0
0
21 May 2025
Is (Selective) Round-To-Nearest Quantization All You Need?
Is (Selective) Round-To-Nearest Quantization All You Need?
Alex Kogan
MQ
44
0
0
21 May 2025
Activation-Guided Consensus Merging for Large Language Models
Activation-Guided Consensus Merging for Large Language Models
Yuxuan Yao
Shuqi Liu
Zehua Liu
Qintong Li
Mingyang Liu
Xiongwei Han
Zhijiang Guo
Han Wu
Linqi Song
MoMe
147
0
0
20 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability
Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability
Qianli Wang
Mingyang Wang
Nils Feldhus
Simon Ostermann
Yuan Cao
Hinrich Schütze
Sebastian Möller
Vera Schmitt
MQ
67
1
0
20 May 2025
FedHQ: Hybrid Runtime Quantization for Federated Learning
FedHQ: Hybrid Runtime Quantization for Federated Learning
Zihao Zheng
Ziyao Wang
Xiuping Cui
Maoliang Li
Jiayu Chen
Liang
Ang Li
Xiang Chen
FedMLMQ
83
0
0
17 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
88
1
0
16 May 2025
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Shihao Zhang
Haoyu Zhang
Ian Colbert
Rayan Saab
MQ
108
0
0
16 May 2025
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks
Chiyue Wei
Bowen Duan
Cong Guo
Jing Zhang
Qingyue Song
Hai "Helen" Li
Yiran Chen
126
0
0
16 May 2025
InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models
InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models
Zhen Li
Yupeng Su
Songmiao Wang
Runming Yang
C. Xie
...
Ming Li
Jiannong Cao
Yuan Xie
Ngai Wong
Hongxia Yang
MQ
122
0
0
16 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
101
0
0
12 May 2025
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiasi Chen
Samet Oymak
ReLMLRM
178
3
0
12 May 2025
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Khurram Mazher
Saad Bin Nasir
MQ
117
0
0
12 May 2025
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Patrick Blumenberg
Thomas Graave
Tim Fingscheidt
MQ
104
0
0
10 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
132
2
0
09 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQMoE
473
1
0
09 May 2025
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han
S. Choi
Joo-Young Kim
61
0
0
09 May 2025
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
Nilesh Prasad Pandey
Shriniwas Kulkarni
David Wang
Onat Gungor
Flavio Ponzina
T. Rosing
97
0
0
08 May 2025
Diffusion Model Quantization: A Review
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
102
0
0
08 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
166
0
0
07 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
67
0
0
05 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
167
0
0
05 May 2025
Real-time Spatial Retrieval Augmented Generation for Urban Environments
Real-time Spatial Retrieval Augmented Generation for Urban Environments
David Nazareno Campo
Javier Conde
Alvaro Alonso
Gabriel Huecas
Joaquín Salvachúa
Pedro Reviriego
69
0
0
04 May 2025
Quantizing Diffusion Models from a Sampling-Aware Perspective
Quantizing Diffusion Models from a Sampling-Aware Perspective
Qian Zeng
Jie Song
Yuanyu Wan
Huiqiong Wang
Mingli Song
DiffMMQ
122
1
0
04 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Jianchao Tan
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
93
0
0
01 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
158
1
0
01 May 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Ziyi Wang
Steven Li
109
3
0
28 Apr 2025
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
Xilong Xie
Liang Wang
Limin Xiao
Meng Han
Lin Sun
S. Zheng
Xiangrong Xu
MQ
93
0
0
28 Apr 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
132
0
0
28 Apr 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
197
1
0
26 Apr 2025
Pushing the boundary on Natural Language Inference
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
227
0
0
25 Apr 2025
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu
Liyan Chen
Yanning Yang
Haoyu Wang
Dong Du
Zhigang Mao
Naifeng Jing
Yubin Xia
Haibo Chen
70
0
0
24 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
187
0
0
23 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Siyang Song
Brucek Khailany
MQ
103
0
0
19 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
69
1
0
19 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
133
1
0
17 Apr 2025
D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
D2^{2}2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Haodong Wang
Qihua Zhou
Zicong Hong
Song Guo
MoE
83
0
0
17 Apr 2025
CSPLADE: Learned Sparse Retrieval with Causal Language Models
CSPLADE: Learned Sparse Retrieval with Causal Language Models
Zhichao Xu
Aosong Feng
Yijun Tian
Haibo Ding
Lin Leee Cheong
RALM
107
0
0
15 Apr 2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Tianyi Zhang
Yang Sui
Shaochen Zhong
Vipin Chaudhary
Helen Zhou
Anshumali Shrivastava
MQ
82
2
0
15 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
112
0
0
14 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
98
1
0
14 Apr 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Deyu Cao
Samin Aref
MQ
91
0
0
14 Apr 2025
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
Zongxian Yang
Jiayu Qian
Z. Huang
Kay Chen Tan
LM&MALRM
180
0
0
13 Apr 2025
Previous
123456789
Next