ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.12040
7
0

BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook

24 May 2025
Hao Gu
Lujun Li
Zheyu Wang
B. Liu
Qiyuan Zhu
Sirui Han
Yike Guo
    MQ
ArXiv (abs)PDFHTML
Main:9 Pages
11 Figures
Bibliography:3 Pages
8 Tables
Appendix:10 Pages
Abstract

Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to ±\pm±1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask management, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages adaptive weight transformation and binary pattern clustering to overcome these limitations, delivering both superior accuracy and efficiency. Our approach incorporates two key innovations: (1) a Learnable Transformation that optimizes invertible scaling and rotation matrices to align binarized weights with full-precision distributions, enabling incoherence processing to enhance layer-wise representation quality; (2) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters, compressing them into compact indices with tailored distance metrics and sign-based centroid updates. This eliminates the need for sparse masks, enabling efficient inference on standard hardware. Our code is available atthis https URL.

View on arXiv
@article{gu2025_2506.12040,
  title={ BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook },
  author={ Hao Gu and Lujun Li and Zheyu Wang and Bei Liu and Qiyuan Zhu and Sirui Han and Yike Guo },
  journal={arXiv preprint arXiv:2506.12040},
  year={ 2025 }
}
Comments on this paper