Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

10 November 2016

Abstract

We prove the first {\em Statistical Query lower bounds} for two fundamental high-dimensional learning problems involving Gaussian distributions: (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown mean Gaussian. In particular, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the complexity of {\em any} Statistical Query algorithm for these problems. Our SQ lower bound for Problem (1) implies that -- as far as SQ algorithms are concerned -- the computational complexity of learning GMMs is inherently exponential {\em in the dimension of the latent space} -- even though there is no such information-theoretic barrier. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in~\cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. On the positive side, we give a new SQ learning algorithm for this problem with optimal accuracy whose running time nearly matches our lower bound. Both our SQ lower bounds are attained via a unified moment-matching technique that may be useful in other contexts. Our SQ learning algorithm for Problem (2) relies on a filtering technique that removes outliers based on higher-order tensors. Our lower bound technique also has implications for related inference problems, specifically for the problem of robust {\em testing} of an unknown mean Gaussian. Here we show an information-theoretic lower bound which separates the sample complexity of the robust testing problem from its non-robust variant. This result is surprising because such a separation does not exist for the corresponding learning problem.

View on arXiv

Comments on this paper