SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning

Conference on Uncertainty in Artificial Intelligence (UAI), 2021

22 February 2021

Hannes Eriksson

D. Basu

Mina Alibeigi

Christos Dimitrakakis

OffRL

ArXiv (abs)PDF HTML

Abstract

In this paper, we consider risk-sensitive sequential decision-making in model-based Reinforcement Learning (RL). Our contributions are two-fold. First, we introduce a novel and coherent quantification of risk, namely composite risk, which quantifies joint effect of aleatory and epistemic risk during the learning process. Existing works considered either aleatory or epistemic risk individually, or an additive combination of the two. We prove that the additive formulation is a particular case of the composite risk when the epistemic risk measure is replaced with expectation. Thus, the composite risk provides an estimate more sensitive to both aleatory and epistemic sources of uncertainties than the individual and additive formulations. Following that, we propose to use a bootstrapping method, SENTINEL-K, for performing distributional RL. SENTINEL-K uses an ensemble of $K$ learners to estimate the return distribution. We use the Follow The Regularised Leader (FTRL) to aggregate the return distributions of $K$ learners and to estimate the composite risk. We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate, demonstrates better risk-sensitive performance than state-of-the-art risk-sensitive and distributional RL algorithms.

View on arXiv

Comments on this paper