5
0

Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets

Abstract

This paper develops an ensemble method for fine-tuning a language model to multiple datasets. Existing methods, such as quantized LoRA (QLoRA), are efficient when adapting to a single dataset. When training on multiple datasets of different tasks, a common setup in practice, it remains unclear how to design an efficient adaptation for fine-tuning language models. We propose to use an ensemble of multiple smaller adapters instead of a single adapter per task. We design an efficient algorithm that partitions nn datasets into mm groups, where mm is typically much smaller than nn in practice, and train one adapter for each group before taking a weighted combination to form the ensemble. The algorithm leverages a first-order approximation property of low-rank adaptation to quickly obtain the fine-tuning performances of dataset combinations since methods like LoRA stay close to the base model. Hence, we use the gradients of the base model to estimate its behavior during fine-tuning. Empirically, this approximation holds with less than 1%1\% error on models with up to 3434 billion parameters, leading to an estimation of true fine-tuning performances under 5%5\% error while speeding up computation compared to base fine-tuning by 105105 times. When applied to fine-tune Llama and GPT models on ten text classification tasks, our approach provides up to 10%10\% higher average test accuracy over QLoRA, with only 9%9\% more FLOPs. On a Llama model with 3434 billion parameters, an ensemble of QLoRA increases test accuracy by 3%3\% compared to QLoRA, with only 8%8\% more FLOPs.

View on arXiv
@article{li2025_2505.21930,
  title={ Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets },
  author={ Dongyue Li and Ziniu Zhang and Lu Wang and Hongyang R. Zhang },
  journal={arXiv preprint arXiv:2505.21930},
  year={ 2025 }
}
Comments on this paper