ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00978
  4. Cited By
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
v1v2v3v4v5 (latest)

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
    EDLMQ
ArXiv (abs)PDFHTML

Papers citing "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"

50 / 425 papers shown
Title
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
Zongxian Yang
Jiayu Qian
Z. Huang
Kay Chen Tan
LM&MALRM
180
0
0
13 Apr 2025
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
Yichao Yuan
Lin Ma
Nishil Talati
MoE
107
0
0
12 Apr 2025
GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable
GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable
Jianqiao Wangni
52
0
0
10 Apr 2025
Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments
Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments
Lorenz Linhardt
Tom Neuhäuser
Lenka Tětková
Oliver Eberle
ALMAI4TS
70
1
0
10 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
78
0
0
09 Apr 2025
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
100
0
0
07 Apr 2025
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression
Ivan Ilin
Peter Richtárik
57
0
0
06 Apr 2025
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency
E. J. Husom
Arda Goknil
Merve Astekin
Lwin Khin Shar
Andre Kåsen
S. Sen
Benedikt Andreas Mithassel
Ahmet Soylu
MQ
101
2
0
04 Apr 2025
Entropy-Based Block Pruning for Efficient Large Language Models
Entropy-Based Block Pruning for Efficient Large Language Models
Liangwei Yang
Yuhui Xu
Juntao Tan
Doyen Sahoo
Siyang Song
Caiming Xiong
Han Wang
Shelby Heinecke
AAML
64
0
0
04 Apr 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQMoE
152
0
0
03 Apr 2025
LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi
LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi
Mahsa Ardakani
Jinendra Malekar
Ramtin Zand
MQ
75
1
0
02 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQLRM
183
4
0
02 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Hui Yuan
Lefei Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
126
1
0
31 Mar 2025
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
128
2
0
30 Mar 2025
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang
Jianyang Gao
Wei Hu
MQ
76
1
0
29 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
505
0
0
28 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
246
4
0
27 Mar 2025
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu
Yuxuan Yao
Shuqi Liu
Zehua Liu
Xiaojin Fu
Xiongwei Han
Xianrui Li
Hui-Ling Zhen
Tao Zhong
Mingxuan Yuan
MoMeLRM
148
14
0
26 Mar 2025
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
Yuxuan Hu
Xiaodong Chen
Cuiping Li
Hong Chen
Jing Zhang
MQ
122
1
0
25 Mar 2025
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Helen Zhou
Bo Yuan
VLM
171
2
0
24 Mar 2025
Payload-Aware Intrusion Detection with CMAE and Large Language Models
Payload-Aware Intrusion Detection with CMAE and Large Language Models
Yongcheol Kim
Chanjae Lee
Young Yoon
77
0
0
23 Mar 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
84
1
0
21 Mar 2025
Improving Quantization with Post-Training Model Expansion
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
107
2
0
21 Mar 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
Haoyu Wang
VLMMQ
144
2
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRLReLMLRM
209
101
0
20 Mar 2025
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
E. Georganas
Dhiraj D. Kalamkar
Alexander Kozlov
A. Heinecke
MQ
439
1
0
17 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
87
2
0
17 Mar 2025
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
Baohao Liao
Christian Herold
Seyyed Hadi Hashemi
Stefan Vasilev
Shahram Khadivi
Christof Monz
MQ
141
0
0
17 Mar 2025
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices
Yangyijian Liu
Jun Yu Li
Wu-Jun Li
70
0
0
15 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
429
0
0
15 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
168
1
0
14 Mar 2025
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng
Shuaiting Li
Zeyu Wang
Kedong Xu
Hong Gu
Kejie Huang
MQ
141
0
0
12 Mar 2025
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge
Maximilian Abstreiter
Sasu Tarkoma
Roberto Morabito
73
1
0
12 Mar 2025
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
Xun Liang
Hanyu Wang
Huayi Lai
Pengnian Qi
Shichao Song
Jiawei Yang
Jihao Zhao
Feiyu Xiong
Simin Niu
Zhiyu Li
VLM
88
0
0
10 Mar 2025
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Ning Ding
Jing Han
Yuchuan Tian
Chao Xu
Kai Han
Yehui Tang
MQ
167
0
0
10 Mar 2025
TR-DQ: Time-Rotation Diffusion Quantization
Yihua Shao
Deyang Lin
Fanhu Zeng
Minxi Yan
Hao Fei
...
Haozhe Wang
Jiaxin Guo
Yan Wang
Haotong Qin
Hao Tang
MQDiffM
138
3
0
09 Mar 2025
SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model
Jing Zhang
Zhiyu Li
Qingyi Gu
MQVLM
78
0
0
09 Mar 2025
Wanda++: Pruning Large Language Models via Regional Gradients
Wanda++: Pruning Large Language Models via Regional Gradients
Yifan Yang
Kai Zhen
Bhavana Ganesh
Aram Galstyan
Goeric Huybrechts
...
S. Bodapati
Nathan Susanj
Zheng Zhang
Jack FitzGerald
Abhishek Kumar
233
3
0
06 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
140
1
0
06 Mar 2025
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
Hongchao Du
Shangyu Wu
Arina Kharlamova
Nan Guan
Chun Jason Xue
97
1
0
04 Mar 2025
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Han Zhao
Wei Zhao
Zhide Zhong
Zongyuan Ge
Jun Ma
Haoang Li
116
7
0
04 Mar 2025
KurTail : Kurtosis-based LLM Quantization
Mohammad Sadegh Akhondzadeh
Aleksandar Bojchevski
E. Eleftheriou
M. Dazzi
MQ
84
0
0
03 Mar 2025
DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems
Minoo Hosseinzadeh
Hana Khamfroush
92
0
0
03 Mar 2025
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo
Sol Namkung
Sunwoo Lee
Inho Jeong
Beomseok Kim
Dongsuk Jeon
117
1
0
28 Feb 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
Chong Chen
Jintao Zhang
Jun Zhu
Jianfei Chen
MQ
163
6
0
28 Feb 2025
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
Rohan Juneja
Shivam Aggarwal
Safeen Huda
Tulika Mitra
L. Peh
84
0
0
27 Feb 2025
Binary Neural Networks for Large Language Model: A Survey
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
TianHuang Su
ZhenYu Yang
MQ
150
0
0
26 Feb 2025
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Angelica I Aviles-Rivero
Chuanlong Xie
Yao Zhu
260
1
0
26 Feb 2025
Compressing Language Models for Specialized Domains
Compressing Language Models for Specialized Domains
Miles Williams
G. Chrysostomou
Vitor Jeronymo
Nikolaos Aletras
MQ
120
0
0
25 Feb 2025
PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback
PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback
Nils Wandel
David Stotko
Alexander Schier
Reinhard Klein
77
0
0
25 Feb 2025
Previous
123456789
Next