Preconditioned Temporal Difference Learning
- MU
Abstract
LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new method. It includes LSPE, and several new data efficient algorithms. The data efficiency of these algorithms is validated by learning an absorbing Markov chain. Also, the asymptotic properties of the new algorithms are analyzed.
View on arXivComments on this paper
