Sparse Subspace Clustering: Algorithm, Theory, and Applications

5 March 2012

Abstract

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories of the data. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points that come from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. As solving the sparse optimization program is NP-hard, we consider its convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed $\ell_1$ -minimization program succeeds in recovering the desired sparse representations. Thanks to the global sparse optimization program, the proposed algorithm does not require initialization, can be solved efficiently, and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by modifying the optimization program to incorporate the model of the data. We verify the effectiveness of our proposed algorithm through experiments on synthetic data as well as two real-world problems of motion segmentation and face clustering.

View on arXiv

Comments on this paper