20
4

Detecting approximate replicate components of a high-dimensional random vector with latent structure

Abstract

High-dimensional feature vectors are likely to contain sets of measurements that are approximate replicates of one another. In complex applications, or automated data collection, these feature sets are not known a priori, and need to be determined. This work proposes a class of latent factor models on the observed high-dimensional random vector XRpX \in \mathbb{R}^p, for defining, identifying and estimating the index set of its approximately replicate components. The model class is parametrized by a p×Kp \times K loading matrix AA that contains a hidden sub-matrix whose rows can be partitioned into groups of parallel vectors. Under this model class, a set of approximate replicate components of XX corresponds to a set of parallel rows in AA: these entries of XX are, up to scale and additive error, the same linear combination of the KK latent factors; the value of KK is itself unknown. The problem of finding approximate replicates in XX reduces to identifying, and estimating, the location of the hidden sub-matrix within AA, and of the partition of its row index set HH. Both HH and its partiton can be fully characterized in terms of a new family of criteria based on the correlation matrix of XX, and their identifiability, as well as that of the unknown latent dimension KK, are obtained as consequences. The constructive nature of the identifiability arguments enables computationally efficient procedures, with consistency guarantees. When AA has the errors-in-variable parametrization, the difficulty of the problem is elevated. The task becomes that of separating out groups of parallel rows that are proportional to canonical basis vectors from other dense parallel rows in AA. This is met under a scale assumption, via a principled way of selecting the target row indices, guided by the succesive maximization of Schur complements of appropriate covariance matrices.

View on arXiv
Comments on this paper