59
10

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

Abstract

The Stochastic Approximation (SA) algorithm introduced by Robbins and Monro in 1951 has been a standard method for solving equations of the form f(θ)=0\mathbf{f}({\boldsymbol {\theta}}) = \mathbf{0}, when only noisy measurements of f()\mathbf{f}(\cdot) are available. If f(θ)=J(θ)\mathbf{f}({\boldsymbol {\theta}}) = \nabla J({\boldsymbol {\theta}}) for some function J()J(\cdot), then SA can also be used to find a stationary point of J()J(\cdot). In much of the literature, it is assumed that the error term xit+1{\boldsymbol {xi}}_{t+1} has zero conditional mean, and that its conditional variance is bounded as a function of tt (though not necessarily with respect to θt{\boldsymbol {\theta}}_t). Also, for the most part, the emphasis has been on ``synchronous'' SA, whereby, at each time tt, \textit{every} component of θt{\boldsymbol {\theta}}_t is updated. Over the years, SA has been applied to a variety of areas, out of which two are the focus in this paper: Convex and nonconvex optimization, and Reinforcement Learning (RL). As it turns out, in these applications, the above-mentioned assumptions do not always hold. In zero-order methods, the error neither has zero mean nor bounded conditional variance. In the present paper, we extend SA theory to encompass errors with nonzero conditional mean and/or unbounded conditional variance, and also asynchronous SA. In addition, we derive estimates for the rate of convergence of the algorithm. Then we apply the new results to problems in nonconvex optimization, and to Markovian SA, a recently emerging area in RL. We prove that SA converges in these situations, and compute the ``optimal step size sequences'' to maximize the estimated rate of convergence.

View on arXiv
Comments on this paper