18
48

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

Abstract

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, GG, is subadditive with respect to the neighborhoods of GG. Namely, if PP and QQ are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, H2(P,Q)H^2(P,Q), between PP and QQ is upper bounded by the sum, vH2(P{v}Πv,Q{v}Πv)\sum_v H^2(P_{\{v\} \cup \Pi_v}, Q_{\{v\} \cup \Pi_v}), of the square Hellinger distances between the marginals of PP and QQ on every node vv and its parents Πv\Pi_v in the DAG. Importantly, our bound does not involve the conditionals but the marginals of PP and QQ. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks PP and QQ on the same (but potentially unknown) DAG satisfy P=QP=Q vs dTV(P,Q)>ϵd_{\rm TV}(P,Q)>\epsilon can be performed from O~(Σ3/4(d+1)n/ϵ2)\tilde{O}(|\Sigma|^{3/4(d+1)} \cdot n/\epsilon^2) samples, where dd is the maximum in-degree of the DAG and Σ\Sigma the domain of each variable of the Bayesian networks. If PP and QQ are defined on potentially different and potentially unknown trees, the sample complexity becomes O~(Σ4.5n/ϵ2)\tilde{O}(|\Sigma|^{4.5} n/\epsilon^2), whose dependence on n,ϵn, \epsilon is optimal up to logarithmic factors. Lastly, if PP and QQ are product distributions over {0,1}n\{0,1\}^n and QQ is known, the sample complexity becomes O(n/ϵ2)O(\sqrt{n}/\epsilon^2), which is optimal up to constant factors.

View on arXiv
Comments on this paper