GPTQT: Quantize Large Language Models Twice to Push the Efficiency

3 July 2024

Papers citing "GPTQT: Quantize Large Language Models Twice to Push the Efficiency"

6 / 6 papers shown

Title
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model BigScience Workshop : Teven Le Scao Angela Fan Christopher Akiki ... Zhongli Xie Zifan Ye M. Bras Younes Belkada Thomas Wolf VLM 392 2,388 0 09 Nov 2022
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning Elias Frantar Sidak Pal Singh Dan Alistarh MQ 101 237 0 24 Aug 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models Gunho Park Baeseong Park Minsub Kim Sungjae Lee Jeonghoon Kim Beomseok Kwon S. Kwon Byeongwook Kim Youngjoo Lee Dongsoo Lee MQ 60 83 0 20 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers Z. Yao Reza Yazdani Aminabadi Minjia Zhang Xiaoxia Wu Conglong Li Yuxiong He VLM MQ 122 479 0 04 Jun 2022
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 328 2,876 0 26 Sep 2016
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks Mohammad Rastegari Vicente Ordonez Joseph Redmon Ali Farhadi MQ 170 4,357 0 16 Mar 2016