48
23

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Abstract

We study stochastic approximation procedures for approximately solving a dd-dimensional linear fixed point equation based on observing a trajectory of length nn from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order tmixdnt_{\mathrm{mix}} \tfrac{d}{n} on the squared error of the last iterate of a standard scheme, where tmixt_{\mathrm{mix}} is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters (d,tmix)(d, t_{\mathrm{mix}}) in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD(λ\lambda) family of algorithms for all λ[0,1)\lambda \in [0, 1) -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of λ\lambda when running the TD(λ\lambda) algorithm).

View on arXiv
Comments on this paper