89
24

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(ε7/4)O(ε^{-7/4}) Complexity

Abstract

Nonconvex optimization with great demand of fast solvers is ubiquitous in modern machine learning. This paper studies two simple accelerated gradient methods, restarted accelerated gradient descent (AGD) and restarted heavy ball (HB) method, for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz conditions. We establish that the two algorithms find an ϵ\epsilon-approximate first-order stationary point in O(ϵ7/4)O(\epsilon^{-7/4}) gradient computations with simple proofs. Our complexity does not hide any polylogarithmic factors, and thus it improves over the state-of-the-art one by the O(log1ϵ)O(\log\frac{1}{\epsilon}) factor. Our algorithms are simple in the sense that they only consist of Nesterov's classical AGD or Polyak's HB iterations, as well as a restart mechanism. They do not need the negative curvature exploitation or the minimization of regularized surrogate functions. Our simple proofs only use very elementary analysis, and in contrast with existing analysis, we do not invoke the analysis of the strongly convex AGD or HB.

View on arXiv
Comments on this paper