476

A near-optimal stochastic gradient method for decentralized non-convex finite-sum optimization

Abstract

This paper describes a nearnear-optimaloptimal stochastic first-order gradient method for decentralized finite-sum minimization of smooth non-convex functions. Specifically, we propose GT-SARAH that employs a local SARAH-type variance reduction and global gradient tracking to address the stochastic and decentralized nature of the problem. Considering a total number of NN cost functions, equally divided over a directed network of nn nodes, we show that GT-SARAH finds an ϵ\epsilon-accurate first-order stationary point in O(N1/2ϵ1){\mathcal{O}(N^{1/2}\epsilon^{-1})} gradient computations across all nodes, independent of the network topology, when nO(N1/2(1λ)3){n\leq\mathcal{O}(N^{1/2}(1-\lambda)^{3})}, where (1λ){(1-\lambda)} is the spectral gap of the network weight matrix. In this regime, GT-SARAH is thus, to the best our knowledge, the first decentralized method that achieves the algorithmic lower bound for this class of problems. Moreover, GT-SARAH achieves a nonnon-asymptoticasymptotic linearlinear speedupspeedup, in that, the total number of gradient computations at each node is reduced by a factor of 1/n1/n compared to the near-optimal algorithms for this problem class that process all data at a single node. We also establish the convergence rate of GT-SARAH in other regimes, in terms of the relative sizes of the number of nodes nn, total number of functions NN, and the network spectral gap (1λ)(1-\lambda). Over infinite time horizon, we establish the almost sure and mean-squared convergence of GT-SARAH to a first-order stationary point.

View on arXiv
Comments on this paper