Near-optimal-sample estimators for spherical Gaussian mixtures

Statistical and machine-learning algorithms are frequently applied to high-dimensional data. In many of these applications data is scarce, and often much more costly than computation time. We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. For mixtures of any -dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses samples and runs in time , both significantly lower than previously known. The constant factor is polynomial for sample complexity and is exponential for the time complexity, again much smaller than what was previously known. We also show that samples are needed for any algorithm. Hence the sample complexity is near-optimal in the number of dimensions. We also derive a simple estimator for one-dimensional mixtures that uses samples and runs in time . Our other technical contributions include a faster algorithm for choosing a density estimate from a set of distributions, that minimizes the distance to an unknown underlying distribution.
View on arXiv