Gradient Q $(σ, λ)$ : A Unified Algorithm with Function Approximation for Reinforcement Learning

6 September 2019

Long Yang

Yu Zhang

Qian Zheng

Pengfei Li

Gang Pan

ArXiv PDF HTML

Abstract

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q $(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$ . However, it is limited to the tabular case, for large-scale learning, the Q $(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ $(\sigma,\lambda)$ that extends tabular Q $(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ $(\sigma,\lambda)$ . Empirical results on some standard domains show that GQ $(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.

View on arXiv

Comments on this paper

Gradient Q(σ,λ)(σ, λ)(σ,λ): A Unified Algorithm with Function Approximation for Reinforcement Learning

Gradient Q $(σ, λ)$ : A Unified Algorithm with Function Approximation for Reinforcement Learning