Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions
- UQCV

A fundamental challenge for any intelligent system is prediction: given some inputs can you predict outcomes . The KL divergence provides a natural measure of prediction quality, but the majority of deep learning research looks only at the marginal predictions per input . In this technical report we propose a scoring rule , parameterized by that evaluates the joint predictions at inputs simultaneously. We show that the commonly-used can be insufficient to drive good decisions in many settings of interest. We also show that, as grows, performing well according to recovers universal guarantees for any possible decision. Finally, we provide problem-dependent guidance on the scale of for which our score provides sufficient guarantees for good performance.
View on arXiv