Subspace approximation with outliers

The subspace approximation problem with outliers, for given points in dimensions , an integer , and an outlier parameter , is to find a -dimensional linear subspace of that minimizes the sum of squared distances to its nearest points. More generally, the subspace approximation problem with outliers minimizes the sum of -th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the inliers in the optimal solution are promised to lie exactly on a -dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal -dimensional subspace summed over the optimal inliers is at least times its squared-error summed over all points, for some . With this assumption, we give an efficient algorithm to find a subset of points whose span contains a -dimensional subspace that gives a multiplicative -approximation to the optimal solution. The running time of our algorithm is linear in and . Interestingly, our results hold even when the fraction of outliers is large, as long as the obvious condition is satisfied.
View on arXiv