Pushing the Limits of Large Language Model Quantization via the Linearity Theorem

26 November 2024

Papers citing "Pushing the Limits of Large Language Model Quantization via the Linearity Theorem"

6 / 6 papers shown

Title
Addition is almost all you need: Compressing neural networks with double binary factorization Vladimír Boža Vladimír Macko MQ 17 0 0 16 May 2025
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance Jinuk Kim Marwa El Halabi W. Park Clemens JS Schaefer Deokjae Lee Yeonhong Park Jae W. Lee Hyun Oh Song MQ 34 0 0 11 May 2025
Towards Quantifying the Hessian Structure of Neural Networks Zhaorui Dong Yushun Zhang Zhi-Quan Luo Jianfeng Yao Ruoyu Sun 31 0 0 05 May 2025
Hessian of Perplexity for Large Language Models by PyTorch autograd (Open Source) Ivan Ilin 26 0 0 06 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization Hao Wang Ligong Han Kai Xu Akash Srivastava MQ 51 0 0 31 Mar 2025
CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Language Models Guanduo Chen Yutong He Yipeng Hu Kun Yuan Binhang Yuan 54 0 0 03 Feb 2025