144
582

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Abstract

We introduce Katyusha\mathtt{Katyusha}, the first direct, primal-only stochastic gradient method that has a provably accelerated convergence rate in convex optimization. In contrast, previous methods are based on dual coordinate descent which are more restrictive, or based on outer-inner loops which make them "blind" to the underlying stochastic nature of the optimization process. Katyusha\mathtt{Katyusha} is the first algorithm that incorporates acceleration directly into stochastic gradient updates. Unlike previous results, Katyusha\mathtt{Katyusha} obtains an optimal convergence rate. It also supports proximal updates, non-Euclidean norm smoothness, non-uniform sampling, and mini-batch sampling. When applied to interesting classes of convex objectives, including smooth objectives (e.g., Lasso, Logistic Regression), strongly-convex objectives (e.g., SVM), and non-smooth objectives (e.g., L1SVM), Katyusha\mathtt{Katyusha} improves the best known convergence rates. The main ingredient behind our result is Katyusha momentum\textit{Katyusha momentum}, a novel "negative momentum on top of momentum" that can be incorporated into a variance-reduction based algorithm and speed it up. As a result, since variance reduction has been successfully applied to a fast growing list of practical problems, our paper suggests that in each of such cases, one had better hurry up and give Katyusha a hug.

View on arXiv
Comments on this paper