20
17

De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

Abstract

Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample XiRp,i=1,...,n,X^i \in\mathbb R^p, i = 1,...,n, generated from an unknown distribution with an unknown covariance matrix Σ0\Sigma_0, our aim is to estimate the first vector of loadings and the largest eigenvalue of Σ0\Sigma_0 in a setting where pnp\gg n. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in 1\ell_1 and 2\ell_2-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix Σ0\Sigma_0. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order n/logp\sqrt{n}/\log p and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.

View on arXiv
Comments on this paper