17
139

Bayesian Hypernetworks

Abstract

We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork \h\h is a neural network which learns to transform a simple noise distribution, p(ϵ)=N(0,\matI)p(\vec\epsilon) = \N(\vec 0,\mat I), to a distribution q(\pp):=q(h(ϵ))q(\pp) := q(h(\vec\epsilon)) over the parameters \pp\pp of another neural network (the "primary network")\@. We train qq with variational inference, using an invertible \h\h to enable efficient estimation of the variational lower bound on the posterior p(\pp\D)p(\pp | \D) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of~q(\pp)q(\pp). In practice, Bayesian hypernets can provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.

View on arXiv
Comments on this paper