Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17888
Cited By
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
29 May 2023
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LLM-QAT: Data-Free Quantization Aware Training for Large Language Models"
50 / 58 papers shown
Title
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
142
1
0
01 Jul 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
32
0
0
09 Jun 2025
ADAMIX: Adaptive Mixed-Precision Delta-Compression with Quantization Error Optimization for Large Language Models
Boya Xiong
Shuo Wang
Weifeng Ge
Guanhua Chen
Yun-Nung Chen
MQ
29
0
0
05 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
19
0
0
04 Jun 2025
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
Guangxin He
Yuan Cao
Yutong He
Tianyi Bai
Kun Yuan
Binhang Yuan
MQ
57
0
0
02 Jun 2025
QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration
Jiatong Li
Libo Zhu
Haotong Qin
Jingkai Wang
Linghe Kong
Guihai Chen
Yulun Zhang
Xiaokang Yang
DiffM
MQ
48
0
0
01 Jun 2025
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
Banseok Lee
Dongkyu Kim
Youngcheon You
Youngmin Kim
MQ
21
0
0
30 May 2025
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Ba-Hien Tran
Van Minh Nguyen
MQ
58
0
0
28 May 2025
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu
MQ
33
0
0
24 May 2025
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
Hao Gu
Lujun Li
Zheyu Wang
B. Liu
Qiyuan Zhu
Sirui Han
Yike Guo
MQ
20
0
0
24 May 2025
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
Zhihang Cai
Xingjun Zhang
Zhendong Tan
Zheng Wei
MQ
197
0
0
22 May 2025
Large Language models for Time Series Analysis: Techniques, Applications, and Challenges
Feifei Shi
Xueyan Yin
Kang Wang
Wanyu Tu
Qifu Sun
Huansheng Ning
AI4TS
22
0
0
21 May 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao
Yi Shen
Shuming Shi
Jieyun Huang
Z. Chen
Rongjia Du
Siqi Xiao
Jing Zhang
Ning Wang
Shiguo Lian
MQ
151
0
0
05 May 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
178
0
0
23 Apr 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Deyu Cao
Samin Aref
MQ
86
0
0
14 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
107
0
0
13 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
161
4
0
02 Apr 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
499
0
0
28 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia Wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
171
9
0
11 Mar 2025
QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution
Libo Zhu
Haotong Qin
Kaicheng Yang
Wenbo Li
Yong Guo
Yulun Zhang
Susanto Rahardja
Xiaokang Yang
MQ
DiffM
133
0
0
07 Mar 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
261
126
0
21 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
246
2
0
18 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
172
10
0
04 Feb 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
298
0
0
28 Jan 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
199
12
0
28 Jan 2025
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
182
1
0
08 Dec 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
170
1
0
27 Nov 2024
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
123
4
0
23 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
George A. Constantinides
Mohamed S. Abdelfattah
MQ
140
2
0
18 Nov 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
127
0
0
28 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
114
3
0
16 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi-An Ma
124
2
0
09 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
112
4
0
08 Oct 2024
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
Linke Song
Zixuan Pang
Wenhao Wang
Zihao Wang
XiaoFeng Wang
Hongbo Chen
Wei Song
Yier Jin
Dan Meng
Rui Hou
145
8
0
30 Sep 2024
Inverted Activations
Georgii Sergeevich Novikov
Ivan Oseledets
33
0
0
22 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Ping Luo
MQ
158
35
0
10 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
158
56
0
09 Jul 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
Li Du
Guoqi Li
Jiajun Zhang
153
6
0
05 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Zongzhang Zhang
Di He
KELM
105
1
0
03 Jul 2024
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
Jacob Nielsen
Peter Schneider-Kamp
MQ
69
5
0
24 Jun 2024
Large Language Model Pruning
Hanjuan Huang
Hao-Jia Song
H. Pao
115
0
0
24 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
118
34
0
23 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
137
19
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
171
0
0
23 May 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
135
42
0
22 Apr 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
286
22
0
28 Feb 2024
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park
Jake Hyun
SangLyul Cho
Bonggeun Sim
Jae W. Lee
MQ
129
20
0
16 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
101
7
0
08 Feb 2024
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
144
17
0
13 Dec 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
104
7
0
11 Oct 2023
1
2
Next