SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

19 June 2025

Main:8 Pages

10 Figures

Bibliography:4 Pages

15 Tables

Appendix:4 Pages

Abstract

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

View on arXiv

@article{khaki2025_2506.16500,
  title={ SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity },
  author={ Samir Khaki and Xiuyu Li and Junxian Guo and Ligeng Zhu and Chenfeng Xu and Konstantinos N. Plataniotis and Amir Yazdanbakhsh and Kurt Keutzer and Song Han and Zhijian Liu },
  journal={arXiv preprint arXiv:2506.16500},
  year={ 2025 }
}

Comments on this paper