67
12

Explicit natural gradient updates for Cholesky factor in Gaussian variational approximation

Abstract

Stochastic gradient methods have enabled variational inference for high-dimensional models and large data. However, the steepest ascent direction in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverted Fisher information matrix. Use of natural gradients can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. In Gaussian variational approximation, natural gradient updates of the natural parameters (expressed in terms of the mean and precision matrix) of the Gaussian distribution can be derived analytically, but do not ensure the precision matrix remains positive definite. To tackle this issue, we consider Cholesky decomposition of the covariance or precision matrix and derive explicit natural gradient updates of the Cholesky factor, which depend only on the first instead of the second derivative of the log posterior density, by finding the inverse of the Fisher information matrix analytically. Efficient natural gradient updates of the Cholesky factor are also derived under sparsity constraints incorporating different posterior independence structures.

View on arXiv
Comments on this paper