ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.13474
33
0

BoA: Attention-aware Post-training Quantization without Backpropagation

19 June 2024
Junhan Kim
Ho-Young Kim
Eulrang Cho
Chungman Lee
Joonyoung Kim
Yongkweon Jeon
    MQ
ArXivPDFHTML
Abstract

Post-training quantization (PTQ) is a promising solution for deploying large language models (LLMs) on resource-constrained devices. Early methods developed for smaller networks like ResNet rely on gradient-based optimization, which becomes impractical for hyper-scale LLMs with billions of parameters. While recently proposed backpropagation-free or transformation-based methods alleviate this issue, their performance remains limited by either a lack of inter-layer dependency consideration or the use of naive nearest-rounding-based integer weight assignment to save the heavy computational cost of weight optimization. We thus introduce a novel backpropagation-free PTQ algorithm that optimizes integer weights by considering inter-layer dependencies. The key innovation is the development of attention-aware Hessian matrices that capture inter-layer interactions within the attention module. Extensive experiments demonstrate that our approach not only outperforms existing weight quantization methods but also shows good synergy with conventional methods to suppress activation outliers, leading to state-of-the-art weight-activation quantization performance.

View on arXiv
@article{kim2025_2406.13474,
  title={ BoA: Attention-aware Post-training Quantization without Backpropagation },
  author={ Junhan Kim and Ho-young Kim and Eulrang Cho and Chungman Lee and Joonyoung Kim and Yongkweon Jeon },
  journal={arXiv preprint arXiv:2406.13474},
  year={ 2025 }
}
Comments on this paper