We introduce , the first direct stochastic gradient method that has an accelerated convergence rate. Given an objective that is an average of convex and smooth functions, converges to an -approximate minimizer using stochastic iterations, where is the condition number. is a direct primal method. In contrast, previous accelerated stochastic methods are either based on dual coordinate descent which are more restrictive, or based on outer-inner loops which make them "blind" to the underlying stochastic nature of the optimization process. is the first algorithm that incorporates acceleration directly into the stochastic gradient updates. supports proximal updates, non-Euclidean norm smoothness, non-uniform sampling, as well as mini-batch sampling. It also improves the best known convergence rates on many interesting classes of convex objectives, including smooth objectives (e.g., Lasso, Logistic Regression), strongly-convex objectives (e.g., SVM), and non-smooth objectives (e.g., L1SVM). The main ingredient behind our result is Katyusha momentum, a clever "negative momentum on top of momentum" that can be added on top of a variance-reduction based algorithm and speed it up. As a result, since variance reduction has been successfully applied to a fast growing list of practical problems, our paper suggests that in each of such cases, one had better hurry up and give Katyusha a hug.
View on arXiv