Markovian Score Climbing: Variational Inference with KL(p||q)

Neural Information Processing Systems (NeurIPS), 2020

23 March 2020

Fredrik Lindsten

Abstract

Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions $q$ and then finds the member of that family that is closest to the exact posterior $p$ . Traditionally, VI algorithms minimize the "exclusive KL" KL $(q\|p)$ , often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL $(p\|q)$ , which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL. Consider a valid MCMC method, a Markov chain whose stationary distribution is $p$ . The algorithm we develop iteratively samples the chain $z[k]$ , and then uses those samples to follow the score function of the variational approximation, $\nabla \log q(z[k])$ with a Robbins-Monro step-size schedule. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. In a variant that ties the variational approximation directly to the Markov chain, MSC further provides a new algorithm that melds VI and MCMC. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.

View on arXiv

Comments on this paper