49
22

Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions

Abstract

A fundamental challenge for any intelligent system is prediction: given some inputs X1,..,XτX_1,..,X_\tau can you predict outcomes Y1,..,YτY_1,.., Y_\tau. The KL divergence dKL\mathbf{d}_{\mathrm{KL}} provides a natural measure of prediction quality, but the majority of deep learning research looks only at the marginal predictions per input XtX_t. In this technical report we propose a scoring rule dKLτ\mathbf{d}_{\mathrm{KL}}^\tau, parameterized by τN\tau \in \mathcal{N} that evaluates the joint predictions at τ\tau inputs simultaneously. We show that the commonly-used τ=1\tau=1 can be insufficient to drive good decisions in many settings of interest. We also show that, as τ\tau grows, performing well according to dKLτ\mathbf{d}_{\mathrm{KL}}^\tau recovers universal guarantees for any possible decision. Finally, we provide problem-dependent guidance on the scale of τ\tau for which our score provides sufficient guarantees for good performance.

View on arXiv
Comments on this paper