52
10

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

Abstract

The Stochastic Approximation (SA) algorithm introduced by Robbins and Monro in 1951 has been a standard method for solving equations of the form f(θ)=0\mathbf{f}({\boldsymbol {\theta}}) = \mathbf{0}, when only noisy measurements of f()\mathbf{f}(\cdot) are available. If f(θ)=J(θ)\mathbf{f}({\boldsymbol {\theta}}) = \nabla J({\boldsymbol {\theta}}) for some function J()J(\cdot), then SA can also be used to find a stationary point of J()J(\cdot). At each time tt, the current guess θt{\boldsymbol {\theta}}_t is updated to θt+1{\boldsymbol {\theta}}_{t+1} using a noisy measurement of the form f(θt)+ξt+1\mathbf{f}({\boldsymbol {\theta}}_t) + {\boldsymbol {\xi}}_{t+1}. In much of the literature, it is assumed that the error term ξt+1{\boldsymbol {\xi}}_{t+1} has zero conditional mean, and/or that its conditional variance is bounded as a function of tt (though not necessarily with respect to θt{\boldsymbol {\theta}}_t). Over the years, SA has been applied to a variety of areas, out of which the focus in this paper is on convex and nonconvex optimization. As it turns out, in these applications, the above-mentioned assumptions on the measurement error do not always hold. In zero-order methods, the error neither has zero mean nor bounded conditional variance. In the present paper, we extend SA theory to encompass errors with nonzero conditional mean and/or unbounded conditional variance. In addition, we derive estimates for the rate of convergence of the algorithm, and compute the ``optimal step size sequences'' to maximize the estimated rate of convergence.

View on arXiv
Comments on this paper