105
2
v1v2v3 (latest)

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

Main:8 Pages
19 Figures
Bibliography:2 Pages
6 Tables
Appendix:12 Pages
Abstract

Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.

View on arXiv
@article{guo2025_2410.21759,
  title={ IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models },
  author={ Hang Guo and Yawei Li and Tao Dai and Shu-Tao Xia and Luca Benini },
  journal={arXiv preprint arXiv:2410.21759},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.