376

Markovian Score Climbing: Variational Inference with KL(p||q)

Neural Information Processing Systems (NeurIPS), 2020
Abstract

Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions qq and then finds the member of that family that is closest to the exact posterior pp. Traditionally, VI algorithms minimize the "exclusive KL" KL(qp)(q\|p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(pq)(p\|q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL. Consider a valid MCMC method, a Markov chain whose stationary distribution is pp. The algorithm we develop iteratively samples the chain z[k]z[k], and then uses those samples to follow the score function of the variational approximation, logq(z[k])\nabla \log q(z[k]) with a Robbins-Monro step-size schedule. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. In a variant that ties the variational approximation directly to the Markov chain, MSC further provides a new algorithm that melds VI and MCMC. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.

View on arXiv
Comments on this paper