27
0

CAFe: Cost and Age aware Federated Learning

S. Liyanaarachchi
Kanchana Thilakarathna
S. Ulukus
Abstract

In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least MM clients out of the total NN clients to send back their local gradients based on a reporting deadline TT, once the parameter server (PS) has broadcasted the global model. If enough clients do not report back within the deadline, the particular round is considered to be a failed round and the training round is restarted from scratch. If enough clients have responded back, the round is deemed successful and the local gradients of all the clients that responded back are used to update the global model. In either case, the clients that failed to report back an update within the deadline would have wasted their computational resources. Having a tighter deadline (small TT) and waiting for a larger number of participating clients (large MM) leads to a large number of failed rounds and therefore greater communication cost and computation resource wastage. However, having a larger TT leads to longer round durations whereas smaller MM may lead to noisy gradients. Therefore, there is a need to optimize the parameters MM and TT such that communication cost and the resource wastage is minimized while having an acceptable convergence rate. In this regard, we show that the average age of a client at the PS appears explicitly in the theoretical convergence bound, and therefore, can be used as a metric to quantify the convergence of the global model. We provide an analytical scheme to select the parameters MM and TT in this setting.

View on arXiv
Comments on this paper