FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

3 June 2025

Main:9 Pages

13 Figures

Bibliography:5 Pages

25 Tables

Appendix:8 Pages

Abstract

Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data, raising concerns about data scarcity and the lack of access to domain-specific, sensitive information. Federated Learning (FL) presents a compelling framework to address these challenges by enabling decentralized fine-tuning on pre-trained LLMs without sharing raw data. However, the compatibility and performance of pre-trained LLMs in FL settings remain largely under explored. We introduce the FlowerTune LLM Leaderboard, a first-of-its-kind benchmarking suite designed to evaluate federated fine-tuning of LLMs across four diverse domains: general NLP, finance, medical, and coding. Each domain includes federated instruction-tuning datasets and domain-specific evaluation metrics. Our results, obtained through a collaborative, open-source and community-driven approach, provide the first comprehensive comparison across 26 pre-trained LLMs with different aggregation and fine-tuning strategies under federated settings, offering actionable insights into model performance, resource constraints, and domain adaptation. This work lays the foundation for developing privacy-preserving, domain-specialized LLMs for real-world applications.

View on arXiv

@article{gao2025_2506.02961,
  title={ FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models },
  author={ Yan Gao and Massimo Roberto Scamarcia and Javier Fernandez-Marques and Mohammad Naseri and Chong Shen Ng and Dimitris Stripelis and Zexi Li and Tao Shen and Jiamu Bai and Daoyuan Chen and Zikai Zhang and Rui Hu and InSeo Song and Lee KangYoon and Hong Jia and Ting Dang and Junyan Wang and Zheyuan Liu and Daniel Janes Beutel and Lingjuan Lyu and Nicholas D. Lane },
  journal={arXiv preprint arXiv:2506.02961},
  year={ 2025 }
}

Comments on this paper