Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare

The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.
View on arXiv@article{kokash2025_2505.20020, title={ Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare }, author={ Natallia Kokash and Lei Wang and Thomas H. Gillespie and Adam Belloum and Paola Grosso and Sara Quinney and Lang Li and Bernard de Bono }, journal={arXiv preprint arXiv:2505.20020}, year={ 2025 } }