Modelling data as being sampled from a union of independent or disjoint subspaces has been widely applied to a number of real world applications. Recently, high dimensional data has come into focus because of advancements in computational power and storage capacity. However, a number of algorithms that assume the aforementioned data model have high time complexity which makes them slow. Dimensionality reduction is a commonly used technique to tackle this problem. In this paper, we first formalize the concept of \textit{Independent Subspace Structure} and then we propose two different randomized algorithms (supervised and unsupervised) for subspace learning that theoretically preserve this structure for any given dataset. This has important implications for subspace segmentation applications as well as algorithms for subspace based clustering and low-rank recovery of data, that can then be carried out in low dimensions by first applying our dimensionality reduction technique to the high dimensional data. We support our theoretical analysis with empirical results on both synthetic and real world data.
View on arXiv