Faster Non-Convex Federated Learning via Global and Local Momentum

7 December 2020

Rudrajit Das

Anish Acharya

Abolfazl Hashemi

Sujay Sanghavi

Inderjit S. Dhillon

Ufuk Topcu

FedML

ArXiv (abs)PDF HTML

Abstract

In this paper, we propose \texttt{FedGLOMO}, the first (first-order) FL algorithm that achieves the optimal iteration complexity (i.e matching the known lower bound) on smooth non-convex objectives -- without using clients' full gradient in each round. Our key algorithmic idea that enables attaining this optimal complexity is applying judicious momentum terms that promote variance reduction in both the local updates at the clients, and the global update at the server. Our algorithm is also provably optimal even with compressed communication between the clients and the server, which is an important consideration in the practical deployment of FL algorithms. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO} which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication-efficiency. As a prequel to \texttt{FedGLOMO}, we propose \texttt{FedLOMO} which applies momentum only in the local client updates. We establish that \texttt{FedLOMO} enjoys improved convergence rates under common non-convex settings compared to prior work, and with fewer assumptions.

View on arXiv

Comments on this paper