Traceable Black-box Watermarks for Federated Learning

Due to the distributed nature of Federated Learning (FL) systems, each local client has access to the global model, posing a critical risk of model leakage. Existing works have explored injecting watermarks into local models to enable intellectual property protection. However, these methods either focus on non-traceable watermarks or traceable but white-box watermarks. We identify a gap in the literature regarding the formal definition of traceable black-box watermarking and the formulation of the problem of injecting such watermarks into FL systems. In this work, we first formalize the problem of injecting traceable black-box watermarks into FL. Based on the problem, we propose a novel server-side watermarking method, , which creates a traceable watermarked model for each client, enabling verification of model leakage in black-box settings. To achieve this, partitions the model parameter space into two distinct regions: the main task region and the watermarking region. Subsequently, a personalized global model is constructed for each client by aggregating only the main task region while preserving the watermarking region. Each model then learns a unique watermark exclusively within the watermarking region using a distinct watermark dataset before being sent back to the local client. Extensive results across various FL systems demonstrate that ensures the traceability of all watermarked models while preserving their main task performance.
View on arXiv@article{xu2025_2505.13651, title={ Traceable Black-box Watermarks for Federated Learning }, author={ Jiahao Xu and Rui Hu and Olivera Kotevska and Zikai Zhang }, journal={arXiv preprint arXiv:2505.13651}, year={ 2025 } }