55
9

Communication-Efficient Federated Learning through Importance Sampling

Abstract

The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client nn sends a sample from a client-only probability distribution qϕ(n)q_{\phi^{(n)}}, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution pθp_{\theta} that is close to the client's distribution qϕ(n)q_{\phi^{(n)}} in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions qϕ(n)q_{\phi^{(n)}}'s and the side information pθp_{\theta} at the server, and propose a framework that requires approximately DKL(qϕ(n)pθ)D_{KL}(q_{\phi^{(n)}}|| p_{\theta}) bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to 5050 times reduction in the bitrate.

View on arXiv
Comments on this paper