Global inducing point variational posteriors for Bayesian neural
networks
- BDL
We derive the optimal conditional approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network, which can naturally be extended to deep Gaussian processes. Our approximate posterior uses learned "global" inducing points, which are defined only at the input layer, and propagated through the network to obtain inducing inputs at subsequent layers. By contrast, standard, "local", inducing point methods from the deep Gaussian process literature optimize a separate set of inducing inputs at every layer, and thus do not model correlations across layers. Our method gives state-of-the-art performance for a variational Bayesian method, without data augmentation or posterior tempering, on CIFAR-10 of .
View on arXiv