17
3

Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Abstract

We consider the problem of identifying, from statistics, a distribution of discrete random variables X1,,XnX_1,\ldots,X_n that is a mixture of kk product distributions. The best previous sample complexity for nO(k)n \in O(k) was (1/ζ)O(k2logk)(1/\zeta)^{O(k^2 \log k)} (under a mild separation assumption parameterized by ζ\zeta). The best known lower bound was exp(Ω(k))\exp(\Omega(k)). It is known that n2k1n\geq 2k-1 is necessary and sufficient for identification. We show, for any n2k1n\geq 2k-1, how to achieve sample complexity and run-time complexity (1/ζ)O(k)(1/\zeta)^{O(k)}. We also extend the known lower bound of eΩ(k)e^{\Omega(k)} to match our upper bound across a broad range of ζ\zeta. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.

View on arXiv
Comments on this paper