ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05766
10
1

Off-policy Distributional Q(λλλ): Distributional RL without Importance Sampling

8 February 2024
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
    OffRL
ArXivPDFHTML
Abstract

We introduce off-policy distributional Q(λ\lambdaλ), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q(λ\lambdaλ) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q(λ\lambdaλ) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q(λ\lambdaλ) and validate theoretical insights with tabular experiments. We show how distributional Q(λ\lambdaλ)-C51, a combination of Q(λ\lambdaλ) with the C51 agent, exhibits promising results on deep RL benchmarks.

View on arXiv
Comments on this paper