We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on . In particular, we consider the \emph{variationally coherent} functions which can be convex or non-convex. The iterates of our algorithm on variationally coherent functions converge almost surely to the global minimizer . Additionally, the very same algorithm with the same hyperparameters, after iterations guarantees on convex functions that the expected suboptimality gap is bounded by for any . It is the first algorithm to achieve both these properties at the same time. Also, the rate for convex functions essentially matches the performance of parameter-free algorithms. Our algorithm is an instance of the Follow The Regularized Leader algorithm with the added twist of using \emph{rescaled gradients} and time-varying linearithmic regularizers.
View on arXiv