Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.11062
Cited By
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
10 July 2024
Yonghong Tian
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Ping Luo
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EfficientQAT: Efficient Quantization-Aware Training for Large Language Models"
26 / 26 papers shown
Title
Scaling Law for Quantization-Aware Training
Mengzhao Chen
Chaoyi Zhang
Jing Liu
Yutao Zeng
Zeyue Xue
...
Yunshui Li
Jin Ma
Jie Huang
Xun Zhou
Ping Luo
MQ
19
0
0
20 May 2025
FedHQ: Hybrid Runtime Quantization for Federated Learning
Zihao Zheng
Ziyao Wang
Xiuping Cui
Maoliang Li
Jiayu Chen
Liang
Ang Li
Xiang Chen
FedML
MQ
19
0
0
17 May 2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang
Yao Fu
Yeqi Huang
Ping Nie
Zhan Lu
...
Dayou Du
Tairan Xu
Kai Zou
Edoardo Ponti
Luo Mai
MoE
27
0
0
16 May 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Deyu Cao
Samin Aref
MQ
29
0
0
14 Apr 2025
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
Yuxuan Hu
Xiaodong Chen
C. Li
Hongyu Chen
Jing Zhang
MQ
60
0
0
25 Mar 2025
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
TianHuang Su
ZhenYu Yang
MQ
70
0
0
26 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
85
5
0
04 Feb 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Yonghong Tian
Yi Liu
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
68
4
0
28 Jan 2025
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
95
1
0
08 Dec 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
78
0
0
24 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
George A. Constantinides
Mohamed S. Abdelfattah
MQ
110
1
0
18 Nov 2024
A Comprehensive Study on Quantization Techniques for Large Language Models
Jiedong Lang
Zhehao Guo
Shuyu Huang
MQ
46
10
0
30 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li
Xinyu Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Zhongchao Shi
Linghe Kong
Yulun Zhang
Xiaokang Yang
MQ
39
2
0
04 Oct 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
46
3
0
18 Sep 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
Li Du
Guoqi Li
Jiajun Zhang
63
5
0
05 Jul 2024
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
Ruihao Gong
Yang Yong
Shiqiao Gu
Yushi Huang
Chentao Lv
Yunchen Zhang
Xianglong Liu
Dacheng Tao
MQ
44
7
0
09 May 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Chengyue Wu
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
90
77
0
07 May 2024
OneBit: Towards Extremely Low-bit Large Language Models
Yuzhuang Xu
Xu Han
Zonghan Yang
Shuo Wang
Qingfu Zhu
Zhiyuan Liu
Weidong Liu
Wanxiang Che
MQ
53
39
0
17 Feb 2024
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
Dayou Du
Yijia Zhang
Shijie Cao
Jiaqi Guo
Ting Cao
Xuming Hu
Ningyi Xu
MQ
46
30
0
16 Feb 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng
Jerry Chee
Qingyao Sun
Volodymyr Kuleshov
Christopher De Sa
MQ
128
101
0
06 Feb 2024
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian
Andrei Panferov
Denis Kuznedelev
Elias Frantar
Artem Babenko
Dan Alistarh
MQ
102
91
0
11 Jan 2024
A Speed Odyssey for Deployable Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Liang Li
Yifan Lu
Xiangxiang Chu
Yerui Sun
Yuchen Xie
MQ
67
7
0
16 Nov 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
126
22
0
13 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
381
2,232
0
22 Mar 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,134
0
20 Sep 2022
1