10,000+ Times Accelerated Robust Subset Selection (ARSS)

12 September 2014

Feiyun Zhu

Abstract

Subset selection from massive data with noised information is increasingly popular for various applications. This problem is still highly challenging as existing methods are generally slow in speed and sensitive to outliers. To address the above two issues, we propose an accelerated robust subset selection (ARSS) method. Specifically in the subset selection area, this is the first work to employ a $\ell_{p}(0<p\leq1)$ -norm based robust measure for the representation loss, preventing too large errors from dominating our objective. In this way, the robustness against both outlier samples and outlier features is greatly enhanced. Actually, data size is generally much larger than feature length, i.e. $N\!\gg\! L$ . Based on this observation, we propose an elegant theorem to significantly reduce the computational cost of our solver, theoretically from $O(N^{3})$ to $O(\min(L,N)\!{}^{3})$ . Extensive experiments on ten benchmark datasets demonstrate that our method not only outperforms state of the art methods, but also runs 10,000+ times faster than the most related method.

View on arXiv

Comments on this paper