13
22

Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu

Abstract

We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution PP on Σn\Sigma^n and a tree TT on nn nodes, we say TT is an ε\varepsilon-approximate tree for PP if there is a TT-structured distribution QQ such that D(P    Q)D(P\;||\;Q) is at most ε\varepsilon more than the best possible tree-structured distribution for PP. We show that if PP itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with O~(Σ3nε1)\widetilde{O}(|\Sigma|^3 n\varepsilon^{-1}) i.i.d.~samples outputs an ε\varepsilon-approximate tree for PP with constant probability. In contrast, for a general PP (which may not be tree-structured), Ω(n2ε2)\Omega(n^2\varepsilon^{-2}) samples are necessary to find an ε\varepsilon-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three random variables X,Y,ZX,Y,Z each over Σ\Sigma, testing if I(X;YZ)I(X; Y \mid Z) is 00 or ε\geq \varepsilon is possible with O~(Σ3/ε)\widetilde{O}(|\Sigma|^3/\varepsilon) samples. Finally, we show that for a specific tree TT, with O~(Σ2nε1)\widetilde{O} (|\Sigma|^2n\varepsilon^{-1}) samples from a distribution PP over Σn\Sigma^n, one can efficiently learn the closest TT-structured distribution in KL divergence by applying the add-1 estimator at each node.

View on arXiv
Comments on this paper