Grassmannian diffusion maps based dimension reduction and classification for high-dimensional data

16 September 2020

Abstract

Diffusion Maps is a nonlinear dimensionality reduction technique used to embed high-dimensional data in a low-dimensional Euclidean space, where the notion of distance is due to the transition probability of a random walk over the dataset. However, the conventional approach is not capable to reveal the dataset underlying subspace structure, a useful information for machine learning applications such as object classification and face recognition. To circumvent this limitation, a novel nonlinear dimensionality reduction technique, referred to as Grassmannian Diffusion Maps, is developed herein relying on the affinity between subspaces represented by points on the Grassmann manifold. To this aim, a kernel matrix is used to construct the transition matrix of a random walk on a graph connecting points on the Grassmann manifold for posterior determination of the diffusion coordinates embedding the data in a low-dimensional Euclidean space. In this paper, three examples are considered to evaluate the performance of both conventional and Grassmannian Diffusion Maps. First, a "toy" example shows that the Grassmannian Diffusion Maps can identify a well-defined parametrization of points on the unit sphere, representing a Grassmann manifold. The second example shows that the Grassmannian Diffusion Maps outperforms the conventional Diffusion Maps in classifying elements, later recovered by a conventional clustering, of a dataset by their intrinsic characteristics. In the last example, a novel data classification/recognition technique is developed based on the construction of an overcomplete dictionary of reduced dimension whose atoms are given by the diffusion coordinates. A face recognition problem is solved and high recognition rates (i.e., 95% in the best-case scenario) are obtained using a fraction of the data required by conventional methods.

View on arXiv

Comments on this paper