v1v2v3 (latest)

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

29 October 2024

Main:8 Pages

19 Figures

Bibliography:2 Pages

6 Tables

Appendix:12 Pages

Abstract

Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.

View on arXiv

@article{guo2025_2410.21759,
  title={ IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models },
  author={ Hang Guo and Yawei Li and Tao Dai and Shu-Tao Xia and Luca Benini },
  journal={arXiv preprint arXiv:2410.21759},
  year={ 2025 }
}

Comments on this paper