111
26

On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

Abstract

Kullback-Leibler (KL) divergence is one of the most important divergence measures between probability distributions. In this paper, we investigate the properties of KL divergence between multivariate Gaussian distributions. First, for any two nn-dimensional Gaussian distributions N1\mathcal{N}_1, N2\mathcal{N}_2, we prove that when KL(N2N1)ε (ε>0)KL(\mathcal{N}_2||\mathcal{N}_1)\leq \varepsilon\ (\varepsilon>0) the supremum of KL(N1N2)KL(\mathcal{N}_1||\mathcal{N}_2) is 12(1W0(e(1+2ε))log1W0(e(1+2ε))1)\dfrac{1}{2}\left(\frac{1}{-W_{0}(-e^{-(1+2\varepsilon)})}-\log \frac{1}{-W_{0}(-e^{-(1+2\varepsilon)})} -1 \right) where W0W_0 is the principal branch of Lambert WW function. For small ε\varepsilon, the supremum is ε+2ε1.5+O(ε2)\varepsilon + 2\varepsilon^{1.5} + O(\varepsilon^2). This quantifies the approximate symmetry of small KL divergence between Gaussians. We also find the infimum of KL(N1N2)KL(\mathcal{N}_1||\mathcal{N}_2) when KL(N2N1)M (M>0)KL(\mathcal{N}_2||\mathcal{N}_1)\geq M\ (M>0). We give the conditions when the supremum and infimum can be attained. Second, for any three nn-dimensional Gaussians N1\mathcal{N}_1, N2\mathcal{N}_2 and N3\mathcal{N}_3, we find an upper bound of KL(N1N3)KL(\mathcal{N}_1||\mathcal{N}_3) if KL(N1N2)ε1KL(\mathcal{N}_1||\mathcal{N}_2)\leq \varepsilon_1 and KL(N2N3)ε2KL(\mathcal{N}_2||\mathcal{N}_3)\leq \varepsilon_2 for ε1,ε20\varepsilon_1,\varepsilon_2\ge 0. For small ε1\varepsilon_1 and ε2\varepsilon_2, the upper bound is 3ε1+3ε2+2ε1ε2+o(ε1)+o(ε2)3\varepsilon_1+3\varepsilon_2+2\sqrt{\varepsilon_1\varepsilon_2}+o(\varepsilon_1)+o(\varepsilon_2). This reveals that KL divergence between Gaussians follows a relaxed triangle inequality. Importantly, all the bounds in our theorems are independent of the dimension nn. Finally, We discuss the applications of our theorems in explaining counterintuitive phenomenon of flow-based model, deriving deep anomaly detection algorithm, and extending one-step robustness guarantee to multiple steps in safe reinforcement learning.

View on arXiv
Comments on this paper