ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.12354
8
21

Distributional Reinforcement Learning via Moment Matching

24 July 2020
Thanh Tang Nguyen
Sunil R. Gupta
Svetha Venkatesh
    OOD
ArXivPDFHTML
Abstract

We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in (Bellemare, Dabney, and Munos 2017; Dabney et al. 2018b). Existing distributional RL methods however constrain the learned statistics to \emph{predefined} functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn \emph{unrestricted} statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.

View on arXiv
Comments on this paper