26
3

Universal Regular Conditional Distributions

Abstract

We introduce a deep learning model that can universally approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first, it linearizes inputs from a given metric space X\mathcal{X} to Rd\mathbb{R}^d via a feature map, then a deep feedforward neural network processes these linearized features, and then the network's outputs are then transformed to the 11-Wasserstein space P1(RD)\mathcal{P}_1(\mathbb{R}^D) via a probabilistic extension of the attention mechanism of Bahdanau et al.\ (2014). Our model, called the \textit{probabilistic transformer (PT)}, can approximate any continuous function from Rd\mathbb{R}^d to P1(RD)\mathcal{P}_1(\mathbb{R}^D) uniformly on compact sets, quantitatively. We identify two ways in which the PT avoids the curse of dimensionality when approximating P1(RD)\mathcal{P}_1(\mathbb{R}^D)-valued functions. The first strategy builds functions in C(Rd,P1(RD))C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D)) which can be efficiently approximated by a PT, uniformly on any given compact subset of Rd\mathbb{R}^d. In the second approach, given any function ff in C(Rd,P1(RD))C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D)), we build compact subsets of Rd\mathbb{R}^d whereon ff can be efficiently approximated by a PT.

View on arXiv
Comments on this paper