Intrinsic Structure as a Proxy for Saliency: SVD-Based Weight Preservation for Mixed-Precision Quantization in Large Language Models

1 December 2025

Shashank Landge

Abhishek Patil

Tejas kamble

Bhushan Buddhivant

Priyanka Joshi

ArXiv (abs)PDF HTML

Main:3 Pages

3 Figures

Bibliography:2 Pages

Abstract

As Large Language Models (LLMs) continue to scale in parameter count, deploying them on commodity hardware has become increasingly challenging. Post-Training Quantization (PTQ) addresses this by reducing the precision of model weights, typically to 4-bit or lower. However, uniform quantization often leads to significant performance degradation due to the presence of ``outlier features'' -- weights that, while few in number, are critical for maintaining model accuracy. Current state-of-the-art methods such as AWQ (Activation-aware Weight Quantization) and SpQR (Sparse Quantization Representations) rely on calibration data to identify these salient weights via activation magnitudes or Hessian sensitivity. In scenarios where data privacy is paramount or calibration data is unavailable, these methods are inapplicable.

View on arXiv

Comments on this paper

All Papers

0 / 0 papers shown

Title