Stochastic approximation for efficient LSTD and least squares regression
We propose stochastic approximation based methods with randomization of samples in two different settings - one for policy evaluation using the least squares temporal difference (LSTD) algorithm and the other for solving the least squares problem. We consider a "big data" regime where both the dimension, d, of the data and the number, T, of training samples are large. Through finite time analyses we provide performance bounds for these methods both in high probability and in expectation. In particular, we show that, with probability 1-\delta, an \epsilon-approximation of the (LSTD or least squares regression) solution can be computed in O(d\ln(1/\delta)/\epsilon^2) complexity, irrespective of the number of samples T. We demonstrate the practicality of our solution scheme for LSTD empirically by combining it with the LSPI algorithm in a traffic signal control application.
View on arXiv