24
31

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Abstract

The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} θnd\theta_n\in\Re^d and {Φn} \{ \Phi_n \} is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} ddtϑt=fˉ(ϑt) \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t), is globally asymptotically stable with stationary point denoted θ\theta^*, where fˉ(θ)= E[f(θ,Φ)]\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)] with Φ\Phi having the stationary distribution of the chain. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3) for the chain: (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in L4L_4. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance  E[znznT]\text{ E} [ z_n z_n^T ] to the asymptotic covariance ΣΘ\Sigma^\Theta in the CLT, where zn=(θnθ)/αnz_n= (\theta_n-\theta^*)/\sqrt{\alpha_n}. (iii) The CLT holds for the normalized version zn PRz^{\text{ PR}}_n of the averaged parameters θn PR\theta^{\text{ PR}}_n, subject to standard assumptions on the step-size. Moreover, the normalized covariance of both θn PR\theta^{\text{ PR}}_n and zn PRz^{\text{ PR}}_n converge to Σ PR\Sigma^{\text{ PR}}, the minimal covariance of Polyak and Ruppert. (iv)} An example is given where ff and fˉ\bar{f} are linear in θ\theta, and the Markov chain is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment of θn\theta_n is unbounded and in fact diverges.

View on arXiv
Comments on this paper