FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments

As artificial intelligence systems increasingly operate in Real-world environments, the integration of multi-modal data sources such as vision, language, and audio presents both unprecedented opportunities and critical challenges for achieving trustworthy intelligence. In this paper, we propose a novel framework that unifies federated learning with explainable multi-modal reasoning to ensure trustworthiness in decentralized, dynamic settings. Our approach, called FedMM-X (Federated Multi-Modal Explainable Intelligence), leverages cross-modal consistency checks, client-level interpretability mechanisms, and dynamic trust calibration to address challenges posed by data heterogeneity, modality imbalance, and out-of-distribution generalization. Through rigorous evaluation across federated multi-modal benchmarks involving vision-language tasks, we demonstrate improved performance in both accuracy and interpretability while reducing vulnerabilities to adversarial and spurious correlations. Further, we introduce a novel trust score aggregation method to quantify global model reliability under dynamic client participation. Our findings pave the way toward developing robust, interpretable, and socially responsible AI systems in Real-world environments.
View on arXiv@article{balija2025_2503.19564, title={ FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments }, author={ Sree Bhargavi Balija }, journal={arXiv preprint arXiv:2503.19564}, year={ 2025 } }