Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

18 March 2016

Abstract

We introduce $\mathtt{Katyusha}$ , the first direct stochastic gradient method that has an accelerated convergence rate. Given an objective that is an average of $n$ convex and smooth functions, $\mathtt{Katyusha}$ converges to an $\varepsilon$ -approximate minimizer using $O((n + \sqrt{n \kappa})\cdot \log\frac{f(x_0)-f(x^*)}{\varepsilon})$ stochastic iterations, where $\kappa$ is the condition number. $\mathtt{Katyusha}$ is a direct primal method. In contrast, previous accelerated stochastic methods are either based on dual coordinate descent which are more restrictive, or based on outer-inner loops which make them "blind" to the underlying stochastic nature of the optimization process. $\mathtt{Katyusha}$ is the first algorithm that incorporates acceleration directly into the stochastic gradient updates. $\mathtt{Katyusha}$ supports proximal updates, non-Euclidean norm smoothness, non-uniform sampling, as well as mini-batch sampling. It also improves the best known convergence rates on many interesting classes of convex objectives, including smooth objectives (e.g., Lasso, Logistic Regression), strongly-convex objectives (e.g., SVM), and non-smooth objectives (e.g., L1SVM). The main ingredient behind our result is Katyusha momentum, a clever "negative momentum on top of momentum" that can be added on top of a variance-reduction based algorithm and speed it up. As a result, since variance reduction has been successfully applied to a fast growing list of practical problems, our paper suggests that in each of such cases, one had better hurry up and give Katyusha a hug.

View on arXiv

Comments on this paper