13
17

List-Decodable Mean Estimation in Nearly-PCA Time

Abstract

Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority 1k\frac 1 k fraction of the dataset is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time. Assuming the ground truth distribution on Rd\mathbb{R}^d has bounded covariance, our algorithm outputs a list of O(k)O(k) candidate means, one of which is within distance O(k)O(\sqrt{k}) from the truth. Our algorithm runs in time O~(ndk)\widetilde{O}(ndk) for all k=O(d)Ω(d)k = O(\sqrt{d}) \cup \Omega(d), where nn is the size of the dataset. We also show that a variant of our algorithm has runtime O~(ndk)\widetilde{O}(ndk) for all kk, at the expense of an O(logk)O(\sqrt{\log k}) factor in the recovery guarantee. This runtime matches up to logarithmic factors the cost of performing a single kk-PCA on the data, which is a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest list-decodable mean estimation algorithms had runtimes O~(n2dk2)\widetilde{O}(n^2 d k^2) and O~(ndk6)\widetilde{O}(nd k^{\ge 6}). Our approach builds on a novel soft downweighting method, SIFT\mathsf{SIFT}, which is arguably the simplest known polynomial-time mean estimation technique in the list-decodable learning setting. To develop our fast algorithms, we boost the computational cost of SIFT\mathsf{SIFT} via a careful "win-win-win" analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which we believe may be of independent interest.

View on arXiv
Comments on this paper