Stochastic gradient methods has enabled variational inference for high-dimensional models and large data sets. However, the direction of steepest ascent in the parameter space of a statistical model is not given by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverse of the Fisher information matrix. Use of natural gradients in optimization can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. Here we consider structured variational approximations with a minimal conditional exponential family representation, which include highly flexible mixtures of exponential family distributions that can fit skewed or multimodal posteriors. We derive complete natural gradient updates for this class of models, which albeit more complex than the natural gradient updates presented prior to this article, account fully for the dependence between the mixing distribution and the distributions of the components. Further experiments will be carried out to evaluate the performance of the complete natural gradient updates.
View on arXiv