23
12

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k

Abstract

Learning a Gaussian mixture model (GMM) is a fundamental problem in machine learning, learning theory, and statistics. One notion of learning a GMM is proper learning: here, the goal is to find a mixture of kk Gaussians M\mathcal{M} that is close to the density ff of the unknown distribution from which we draw samples. The distance between M\mathcal{M} and ff is typically measured in the total variation or L1L_1-norm. We give an algorithm for learning a mixture of kk univariate Gaussians that is nearly optimal for any fixed kk. The sample complexity of our algorithm is O~(kϵ2)\tilde{O}(\frac{k}{\epsilon^2}) and the running time is (klog1ϵ)O(k4)+O~(kϵ2)(k \cdot \log\frac{1}{\epsilon})^{O(k^4)} + \tilde{O}(\frac{k}{\epsilon^2}). It is well-known that this sample complexity is optimal (up to logarithmic factors), and it was already achieved by prior work. However, the best known time complexity for proper learning a kk-GMM was O~(1ϵ3k1)\tilde{O}(\frac{1}{\epsilon^{3k-1}}). In particular, the dependence between 1ϵ\frac{1}{\epsilon} and kk was exponential. We significantly improve this dependence by replacing the 1ϵ\frac{1}{\epsilon} term with a log1ϵ\log \frac{1}{\epsilon} while only increasing the exponent moderately. Hence, for any fixed kk, the O~(kϵ2)\tilde{O} (\frac{k}{\epsilon^2}) term dominates our running time, and thus our algorithm runs in time which is nearly-linear in the number of samples drawn. Achieving a running time of poly(k,1ϵ)\textrm{poly}(k, \frac{1}{\epsilon}) for proper learning of kk-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Moreover, our approach offers an agnostic learning guarantee: our algorithm returns a good GMM even if the distribution we are sampling from is not a mixture of Gaussians. To the best of our knowledge, our algorithm is the first agnostic proper learning algorithm for GMMs.

View on arXiv
Comments on this paper