19
2

On Convergence of Gradient Expected Sarsa(λλ)

Abstract

We study the convergence of Expected Sarsa(λ)\mathtt{Expected~Sarsa}(\lambda) with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to Expected Sarsa(λ)\mathtt{Expected~Sarsa}(\lambda) is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent Gradient Expected Sarsa(λ)\mathtt{Gradient~Expected~Sarsa}(\lambda) (GES(λ)\mathtt{GES}(\lambda)) algorithm. The theoretical analysis shows that our GES(λ)\mathtt{GES}(\lambda) converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of GES(λ)\mathtt{GES}(\lambda), such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally, we conduct experiments to verify the effectiveness of our GES(λ)\mathtt{GES}(\lambda).

View on arXiv
Comments on this paper