50
26

Average Case Column Subset Selection for Entrywise 1\ell_1-Norm Loss

Abstract

We study the column subset selection problem with respect to the entrywise 1\ell_1-norm loss. It is known that in the worst case, to obtain a good rank-kk approximation to a matrix, one needs an arbitrarily large nΩ(1)n^{\Omega(1)} number of columns to obtain a (1+ϵ)(1+\epsilon)-approximation to the best entrywise 1\ell_1-norm low rank approximation of an n×nn \times n matrix. Nevertheless, we show that under certain minimal and realistic distributional settings, it is possible to obtain a (1+ϵ)(1+\epsilon)-approximation with a nearly linear running time and poly(k/ϵ)+O(klogn)(k/\epsilon)+O(k\log n) columns. Namely, we show that if the input matrix AA has the form A=B+EA = B + E, where BB is an arbitrary rank-kk matrix, and EE is a matrix with i.i.d. entries drawn from any distribution μ\mu for which the (1+γ)(1+\gamma)-th moment exists, for an arbitrarily small constant γ>0\gamma > 0, then it is possible to obtain a (1+ϵ)(1+\epsilon)-approximate column subset selection to the entrywise 1\ell_1-norm in nearly linear time. Conversely we show that if the first moment does not exist, then it is not possible to obtain a (1+ϵ)(1+\epsilon)-approximate subset selection algorithm even if one chooses any no(1)n^{o(1)} columns. This is the first algorithm of any kind for achieving a (1+ϵ)(1+\epsilon)-approximation for entrywise 1\ell_1-norm loss low rank approximation.

View on arXiv
Comments on this paper