ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00978
  4. Cited By
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
v1v2v3v4v5 (latest)

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
    EDLMQ
ArXiv (abs)PDFHTML

Papers citing "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"

50 / 425 papers shown
Title
Progressive Binarization with Semi-Structured Pruning for LLMs
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
155
1
0
01 Jul 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yushen Chen
Jiawei Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
97
1
0
01 Jul 2025
Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models
Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models
Gyeongje Cho
Yeonkyoun So
Chanwoo Park
Sangmin Lee
Sungmok Jung
Jaejin Lee
VLM
38
0
0
18 Jun 2025
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Yuxin Huang
Han Zhao
Donglin Wang
Haoang Li
35
0
0
16 Jun 2025
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
Junho Yoon
Geom Lee
Donghyeon Jeon
Inho Kang
Seung-Hoon Na
MQVLM
51
0
0
16 Jun 2025
On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving
On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving
Pedram MohajerAnsari
Amir Salarpour
Michael Kuhr
Siyu Huang
Mohammad Hamad
Sebastian Steinhorst
Habeeb Olufowobi
Mert D. Pesé
AAML
28
0
0
13 Jun 2025
SlotPi: Physics-informed Object-centric Reasoning Models
SlotPi: Physics-informed Object-centric Reasoning Models
Jian Li
Wan Han
Ning Lin
Yu-Liang Zhan
Ruizhi Chengze
...
Yi-Feng Zhang
Hongsheng Liu
Zidong Wang
Fan Yu
Hao Sun
OCLLRMAI4CE
133
0
0
12 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
112
0
0
11 Jun 2025
ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
Amirreza Rouhi
Solmaz Arezoomandan
Knut Peterson
Joseph T. Woods
David Han
VLM
49
0
0
10 Jun 2025
Real-Time Execution of Action Chunking Flow Policies
Real-Time Execution of Action Chunking Flow Policies
Kevin Black
Manuel Y. Galliker
Sergey Levine
OffRL
35
0
0
09 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
46
0
0
09 Jun 2025
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
Wei Tao
Haocheng Lu
Xiaoyang Qu
Bin Zhang
Kai Lu
Jiguang Wan
Jianzong Wang
MQMoE
27
0
0
09 Jun 2025
BAQ: Efficient Bit Allocation Quantization for Large Language Models
BAQ: Efficient Bit Allocation Quantization for Large Language Models
Chao Zhang
Li Wang
S. Lasaulce
Mérouane Debbah
MQ
77
0
0
06 Jun 2025
EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
Alyssa Pinnock
Shakya Jayakody
Kawsher A Roxy
Md Rubel Ahmed
43
0
0
06 Jun 2025
ADAMIX: Adaptive Mixed-Precision Delta-Compression with Quantization Error Optimization for Large Language Models
ADAMIX: Adaptive Mixed-Precision Delta-Compression with Quantization Error Optimization for Large Language Models
Boya Xiong
Shuo Wang
Weifeng Ge
Guanhua Chen
Yun-Nung Chen
MQ
36
0
0
05 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
123
0
0
05 Jun 2025
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Seungcheol Park
Jeongin Bae
Beomseok Kwon
Minjun Kim
Byeongwook Kim
S. Kwon
U. Kang
Dongsoo Lee
MQ
154
0
0
04 Jun 2025
Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information
Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information
Seungcheol Park
Sojin Lee
Jongjin Kim
Jinsik Lee
Hyunjik Jo
U. Kang
83
2
0
04 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
MANBench: Is Your Multimodal Model Smarter than Human?
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
28
0
0
04 Jun 2025
UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection
UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection
Jigang Fan
Quanlin Wu
Shengjie Luo
Liwei Wang
29
0
0
03 Jun 2025
Pruning General Large Language Models into Customized Expert Models
Pruning General Large Language Models into Customized Expert Models
Yirao Zhao
Guizhen Chen
Kenji Kawaguchi
Lidong Bing
Wenxuan Zhang
80
0
0
03 Jun 2025
Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
Yoonjun Cho
Soeun Kim
Dongjae Jeon
Kyelim Lee
Beomsoo Lee
Albert No
MQ
40
0
0
02 Jun 2025
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
Guangxin He
Yuan Cao
Yutong He
Tianyi Bai
Kun Yuan
Binhang Yuan
MQ
64
0
0
02 Jun 2025
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
Zekun Wang
Minghua Ma
Zexin Wang
Rongchuan Mu
Liping Shan
Ming Liu
Bing Qin
VLM
46
0
0
31 May 2025
Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation
Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation
Muhammad Adnan
Nithesh Kurella
Akhil Arunkumar
Prashant J. Nair
DiffMVGen
41
0
0
31 May 2025
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou
S. Wang
Xingyu Dong
Xiangqi Jin
Yifang Chen
Yue Min
Kexin Yang
Xingzhang Ren
Dayiheng Liu
Linfeng Zhang
OffRLLRM
37
0
0
31 May 2025
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
Banseok Lee
Dongkyu Kim
Youngcheon You
Youngmin Kim
MQ
37
0
0
30 May 2025
Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization
Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization
Wang Mengjie
Zhu Huiping
Li Jian
Shi Wenxiu
Zhang Song
31
0
0
28 May 2025
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Ba-Hien Tran
Van Minh Nguyen
MQ
66
0
0
28 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
59
0
0
28 May 2025
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
Yudi Zhang
Weilin Zhao
Xu Han
Tiejun Zhao
Wang Xu
Hailong Cao
Conghui Zhu
MQ
60
1
0
28 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
77
0
0
27 May 2025
DLP: Dynamic Layerwise Pruning in Large Language Models
DLP: Dynamic Layerwise Pruning in Large Language Models
Yuli Chen
B. Cheng
Jiale Han
Yingying Zhang
Yingting Li
Shuhao Zhang
56
0
0
27 May 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
77
0
0
27 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
83
0
0
27 May 2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu
Yi Ge
Yichen You
Enshu Liu
Zhihang Yuan
Guohao Dai
Shengen Yan
Huazhong Yang
Yu Wang
MoELRM
75
1
0
27 May 2025
FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
Daehyeon Baek
Jieun Choi
Jimyoung Son
Kyungmin Bin
Seungbeom Choi
Kihyo Moon
Minsung Jang
Hyojung Lee
MQ
25
0
0
27 May 2025
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits
Yeshwanth Venkatesha
Souvik Kundu
Priyadarshini Panda
52
0
0
27 May 2025
BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models
Liulu He
Shenli Zhen
Karwei Sun
Yijiang Liu
Yufei Zhao
Chongkang Tan
Huanrui Yang
Yuan Du
Li Du
MQ
23
0
0
26 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
55
0
0
26 May 2025
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
Dong Liu
Jiayi Zhang
Yifan Li
Yanxuan Yu
Ben Lengerich
Ying Nian Wu
78
1
0
26 May 2025
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
Dingyu Yao
Bowen Shen
Zheng Lin
Wei Liu
Jian Luan
Bin Wang
Weiping Wang
MQ
79
0
0
26 May 2025
Does quantization affect models' performance on long-context tasks?
Does quantization affect models' performance on long-context tasks?
Anmol Mekala
Anirudh Atmakuru
Yixiao Song
Marzena Karpinska
Mohit Iyyer
MQ
70
0
0
26 May 2025
FP4 All the Way: Fully Quantized Training of LLMs
FP4 All the Way: Fully Quantized Training of LLMs
Brian Chmiel
Maxim Fishman
Ron Banner
Daniel Soudry
MQ
89
0
0
25 May 2025
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu
MQ
40
0
0
24 May 2025
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
Hao Gu
Lujun Li
Zheyu Wang
B. Liu
Qiyuan Zhu
Sirui Han
Yike Guo
MQ
34
0
0
24 May 2025
Why Do Some Inputs Break Low-Bit LLM Quantization?
Why Do Some Inputs Break Low-Bit LLM Quantization?
Ting-Yun Chang
Muru Zhang
Jesse Thomason
Robin Jia
MQ
34
0
0
24 May 2025
$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
μμμ-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
T. Koike-Akino
Jing Liu
Ye Wang
MoE
41
0
0
24 May 2025
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Zhaoyuan Su
Tingfeng Lan
Zirui Wang
Juncheng Yang
Yue Cheng
29
0
0
24 May 2025
Towards Practical Defect-Focused Automated Code Review
Towards Practical Defect-Focused Automated Code Review
Junyi Lu
Lili Jiang
Xiaojia Li
Jianbing Fang
Fengjun Zhang
Li Yang
Chun Zuo
218
0
0
23 May 2025
123456789
Next