10,000+ Times Accelerated Robust Subset Selection (ARSS)

12 September 2014

Abstract

Subset selection from massive data with degraded information is increasingly popular for various applications. This problem is still highly challenging due to the low speed and the sensitivity to outliers of existing methods. To tackle the above two issues, we propose an accelerated robust subset selection (ARSS) method. Specifically in the subset selection task, it is the first time to apply the $\ell_{p}\left(0\!<\! p\!\leq\!1\right)$ -norm to measure the representation loss, preventing too large errors from dominating our objective. In this way, the robustness against both outlier samples and outlier features is greatly enhanced. Actually, sample size is generally much larger than feature length, i.e. $N\!\gg\! L$ . Based on this observation, we propose an elegant theorem to greatly reduce the computational cost, theoretically from $O\left(N^{3}\right)$ to $O\left(\min\left(L,N\right)^{3}\right)$ . Extensive experiments on ten benchmark datasets demonstrate that our method not only outperforms state of the art methods, but also runs 10,000+ times faster than the most related method.

View on arXiv

Comments on this paper