Spectral Sparse Representation for Clustering: Evolved from PCA, K-means, Laplacian Eigenmap, and Ratio Cut

25 March 2014

Gang Pan

Abstract

Dimensionality reduction, cluster analysis, and sparse representation are among the cornerstones of machine learning. However, they seem unrelated to each other and are often applied independently in practice. In this paper, we discovered that the spectral graph theory underlies a series of these elementary methods and unifies them into a complete framework. The methods range from PCA, K-means, Laplacian eigenmap (LE), ratio cut (Rcut), and a new sparse representation method uncovered by us, called spectral sparse representation (SSR). Further, extended relations to conventional over-complete sparse representations, e.g., MOD, KSVD, manifold learning, e.g., kernel PCA, MDS, Isomap, LLE, and subspace clustering, e.g., SSC, LRR are incorporated. We will show that, under an ideal condition from the spectral graph theory, PCA, K-means, LE, and Rcut are unified together, and when the condition is relaxed, the unification evolves to SSR, which lies in the intermediate between PCA/LE and K-mean/Rcut, and combines merits of both sides: the sparse codes of it reduce dimensionality of data meanwhile revealing cluster structure. Plenty of properties and clear interpretations exhibit, and due to its inherent relation to cluster analysis, the codes of SSR can be directly used for clustering. The linear version of SSR is under-complete, complementing the conventional over-complete sparse representations. An efficient algorithm, NSCrt, is developed to solve the sparse codes of SSR. By virtue of its good performance, the application of SSR to clustering, called Scut, reaches state-of-the-art performance in spectral clustering family. Experiments on data sets of diverse nature demonstrate the properties and strengths of SSR, NSCrt, and Scut.

View on arXiv

Comments on this paper