14
31

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

Abstract

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the O((γ1σγ)2Lϵ)O((\frac{\gamma}{1-\sigma_{\gamma}})^2\sqrt{\frac{L}{\epsilon}}) and O((γ1σγ)1.5Lμlog1ϵ)O((\frac{\gamma}{1-\sigma_{\gamma}})^{1.5}\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon}) complexities for the practical single loop accelerated gradient tracking over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where γ\gamma and σγ\sigma_{\gamma} are two common constants charactering the network connectivity, ϵ\epsilon is the desired precision, and LL and μ\mu are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of O(1ϵ5/7)O(\frac{1}{\epsilon^{5/7}}) and O((Lμ)5/71(1σ)1.5log1ϵ)O((\frac{L}{\mu})^{5/7}\frac{1}{(1-\sigma)^{1.5}}\log\frac{1}{\epsilon}), respectively, which were proved in the original literature only for static graphs, where 11σ\frac{1}{1-\sigma} equals γ1σγ\frac{\gamma}{1-\sigma_{\gamma}} when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to O(1)O(1) and O(γ1σγ)O(\frac{\gamma}{1-\sigma_{\gamma}}) for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.

View on arXiv
Comments on this paper