50
0

Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-Łojasiewicz condition

Abstract

In this work, we establish the linear convergence estimate for the gradient descent involving the delay τN\tau\in\mathbb{N} when the cost function is μ\mu-strongly convex and LL-smooth. This result improves upon the well-known estimates in Arjevani et al. \cite{ASS} and Stich-Karmireddy \cite{SK} in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate η\eta can be extended from η1/(10Lτ)\eta\leq 1/(10L\tau) to η1/(4Lτ)\eta\leq 1/(4L\tau) for τ=1\tau =1 and η3/(10Lτ)\eta\leq 3/(10L\tau) for τ2\tau \geq 2, where L>0L >0 is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak-{\L}ojasiewicz\,(PL) condition, for which the available choice of learning rate is further improved as η9/(10Lτ)\eta\leq 9/(10L\tau) for the large delay τ\tau. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.

View on arXiv
Comments on this paper