Off-policy Distributional Q( $λ$ ): Distributional RL without Importance Sampling

8 February 2024

Abstract

We introduce off-policy distributional Q( $\lambda$ ), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q( $\lambda$ ) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q( $\lambda$ ) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q( $\lambda$ ) and validate theoretical insights with tabular experiments. We show how distributional Q( $\lambda$ )-C51, a combination of Q( $\lambda$ ) with the C51 agent, exhibits promising results on deep RL benchmarks.

View on arXiv

Comments on this paper

Off-policy Distributional Q(λλλ): Distributional RL without Importance Sampling

Off-policy Distributional Q( $λ$ ): Distributional RL without Importance Sampling