Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto , a ball of fixed radius around the initial point . We show an approximation bound of where is the approximation quality of the best neural network in and is the width of all hidden layers in the network.
View on arXiv