ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00978
  4. Cited By
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
v1v2v3v4v5 (latest)

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

1 June 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
    EDLMQ
ArXiv (abs)PDFHTML

Papers citing "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration"

50 / 425 papers shown
Title
Uncertainty Quantification in Retrieval Augmented Question Answering
Uncertainty Quantification in Retrieval Augmented Question Answering
Laura Perez-Beltrachini
Mirella Lapata
RALM
163
0
0
25 Feb 2025
LightThinker: Thinking Step-by-Step Compression
LightThinker: Thinking Step-by-Step Compression
Jintian Zhang
Yuqi Zhu
Mengshu Sun
Yujie Luo
Shuofei Qiao
Lun Du
Da Zheng
Ningyu Zhang
N. Zhang
LRMLLMAG
140
34
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
196
0
0
24 Feb 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Guohao Li
Yibo Yang
Yujie Zhong
Ming-Hsuan Yang
88
1
0
24 Feb 2025
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models
Yibo Zhong
Haoxiang Jiang
Lincan Li
Ryumei Nakada
Tianci Liu
Linjun Zhang
Huaxiu Yao
Haoyu Wang
259
3
0
24 Feb 2025
LLM Inference Acceleration via Efficient Operation Fusion
LLM Inference Acceleration via Efficient Operation Fusion
Mahsa Salmani
I. Soloveychik
97
0
0
24 Feb 2025
Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation
Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation
Deokhyung Kang
Jeonghun Cho
Yejin Jeon
Sunbin Jang
Minsub Lee
Jawoon Cho
Gary Lee
88
0
0
23 Feb 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding
Wentao Jiang
Shunyu Liu
Yongcheng Jing
Jinpei Guo
...
Zengmao Wang
Ziqiang Liu
Di Lin
Xianglong Liu
Dacheng Tao
LRM
128
11
0
22 Feb 2025
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang
Bairu Hou
Wei Wei
Yujia Bao
Shiyu Chang
VLM
190
3
0
21 Feb 2025
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures
Jiayu Qin
Jianchao Tan
Jianchao Tan
Xunliang Cai
Wei Wang
77
0
0
19 Feb 2025
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Artyom Kharinaev
Viktor Moskvoretskii
Egor Shvetsov
Kseniia Studenikina
Bykov Mikhail
Evgeny Burnaev
MQ
111
0
0
18 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
MQALM
218
0
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
264
2
0
18 Feb 2025
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Jiashuo Wang
Hansong Zhou
Ting Song
Shijie Cao
Yan Xia
Ting Cao
Jianyu Wei
Shuming Ma
Hongyu Wang
Furu Wei
129
1
0
17 Feb 2025
DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services
DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services
Ting Sun
Penghan Wang
Fan Lai
107
0
0
17 Feb 2025
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
H. Seo
Wongi Jeong
Jae-sun Seo
Se Young Chun
145
0
0
12 Feb 2025
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
Kunal Handa
Alex Tamkin
Miles McCain
Saffron Huang
Esin Durmus
...
Kevin K. Troy
Dario Amodei
Jared Kaplan
Jack Clark
Deep Ganguli
MLAU
115
1
0
11 Feb 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
114
1
0
10 Feb 2025
Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents
Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents
Chenyang Shao
Xinyuan Hu
Yutang Lin
Fengli Xu
LLMAGLRM
152
9
0
06 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
112
5
0
05 Feb 2025
Position: AI Scaling: From Up to Down and Out
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
241
1
0
02 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
205
12
0
28 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
82
0
0
22 Jan 2025
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja
Siyang Song
Haneul Yoo
Jiho Jin
Kyunghyun Cho
Alice Oh
AI4TSVLM
69
0
0
21 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
561
2
0
15 Jan 2025
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
Yuji Chai
Mujin Kwen
David Brooks
Gu-Yeon Wei
MQ
94
3
0
13 Jan 2025
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
Jerry Chee
A. Backurs
Rainie Heck
Li Zhang
Janardhan Kulkarni
Thomas Rothvoss
Sivakanth Gopi
MQ
145
1
0
11 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Zehua Wang
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQLRM
188
4
0
06 Jan 2025
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
Zhe Chen
Yusheng Liao
Shuyang Jiang
Pingjie Wang
Yu Guo
Yucheng Wang
Yu Wang
112
3
0
05 Jan 2025
DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
135
0
0
28 Dec 2024
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
Binrui Zeng
Shezheng Song
Xiaodong Liu
Jie Yu
Huijun Liu
Jun Ma
Xiaopeng Li
Shasha Li
Xinran Hong
Yongtao Tang
MQ
130
1
0
24 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Xing Mei
Lean Fu
MQ
135
0
0
23 Dec 2024
FineGates: LLMs Finetuning with Compression using Stochastic Gates
FineGates: LLMs Finetuning with Compression using Stochastic Gates
Jonathan Svirsky
Yehonathan Refael
Ofir Lindenbaum
130
1
0
17 Dec 2024
DQA: An Efficient Method for Deep Quantization of Deep Neural Network
  Activations
DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations
Wenhao Hu
Paul Henderson
José Cano
MQ
93
0
0
12 Dec 2024
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
184
1
0
08 Dec 2024
SKIM: Any-bit Quantization Pushing The Limits of Post-Training
  Quantization
SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
Runsheng Bai
Qiang Liu
B. Liu
MQ
139
2
0
05 Dec 2024
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang
Hui Chen
Jianchao Tan
Jianchao Tan
Xunliang Cai
Zijia Lin
Jiawei Han
Jungong Han
Guiguang Ding
VLM
180
3
0
04 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
290
7
0
04 Dec 2024
Multi-Bin Batching for Increasing LLM Inference Throughput
Multi-Bin Batching for Increasing LLM Inference Throughput
Ozgur Guldogan
Jackson Kunde
Kangwook Lee
Ramtin Pedarsani
LRM
137
2
0
03 Dec 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
175
1
0
27 Nov 2024
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
  Quantized LLMs with 100T Training Tokens
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
Xu Ouyang
Tao Ge
Thomas Hartvigsen
Zhisong Zhang
Haitao Mi
Dong Yu
MQ
153
5
0
26 Nov 2024
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step
  Diffusion based Image Super-Resolution
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Libo Zhu
Jiajian Li
Haotong Qin
Wenbo Li
Yulun Zhang
Yong Guo
Xiaokang Yang
DiffMMQ
128
3
0
26 Nov 2024
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
Yu Zhang
Ming Wang
Lancheng Zou
Wulong Liu
Hui-Ling Zhen
Mingxuan Yuan
Bei Yu
MQ
97
1
0
25 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped
  Activation Data Format
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
154
2
0
24 Nov 2024
Reassessing Layer Pruning in LLMs: New Insights and Methods
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
130
4
0
23 Nov 2024
freePruner: A Training-free Approach for Large Multimodal Model
  Acceleration
freePruner: A Training-free Approach for Large Multimodal Model Acceleration
Bingxin Xu
Yuzhang Shang
Yunhao Ge
Qian Lou
Yan Yan
146
3
0
23 Nov 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
AI4CE
256
3
0
21 Nov 2024
Bi-Mamba: Towards Accurate 1-Bit State Space Models
Shengkun Tang
Liqun Ma
Haoyang Li
Mingjie Sun
Zhiqiang Shen
Mamba
129
3
0
18 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
George A. Constantinides
Mohamed S. Abdelfattah
MQ
153
2
0
18 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
113
3
0
17 Nov 2024
Previous
123456789
Next