444

High-dimensional analysis of semidefinite relaxations for sparse principal components

Abstract

Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the ``large \pdim\pdim, small \numobs\numobs'' setting, in which the problem dimension \pdim\pdim is comparable to or larger than the sample size \numobs\numobs. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say with at most \kdim\kdim non-zero components. We analyze two computationally tractable methods for recovering the support of this maximal eigenvector: (a) a simple diagonal cut-off method, which transitions from success to failure as a function of the order parameter \thetadiag(\numobs,\pdim,\kdim)=\numobs/[\kdim2log(\pdim\kdim)]\thetadiag(\numobs, \pdim, \kdim) = \numobs/[\kdim^2 \log(\pdim - \kdim)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the order parameter \thetasdp(\numobs,\pdim,\kdim)=\numobs/[\kdimlog(\pdim\kdim)]\thetasdp(\numobs, \pdim, \kdim) = \numobs/[\kdim \log(\pdim - \kdim)] is larger than a critical threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.

View on arXiv
Comments on this paper