31
35

New asymptotic results in principal component analysis

Abstract

Let XX be a mean zero Gaussian random vector in a separable Hilbert space H{\mathbb H} with covariance operator Σ:=E(XX).\Sigma:={\mathbb E}(X\otimes X). Let Σ=r1μrPr\Sigma=\sum_{r\geq 1}\mu_r P_r be the spectral decomposition of Σ\Sigma with distinct eigenvalues μ1>μ2>\mu_1>\mu_2> \dots and the corresponding spectral projectors P1,P2,.P_1, P_2, \dots. Given a sample X1,,XnX_1,\dots, X_n of size nn of i.i.d. copies of X,X, the sample covariance operator is defined as Σ^n:=n1j=1nXjXj.\hat \Sigma_n := n^{-1}\sum_{j=1}^n X_j\otimes X_j. The main goal of principal component analysis is to estimate spectral projectors P1,P2,P_1, P_2, \dots by their empirical counterparts P^1,P^2,\hat P_1, \hat P_2, \dots properly defined in terms of spectral decomposition of the sample covariance operator Σ^n.\hat \Sigma_n. The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic P^rPr22,\|\hat P_r-P_r\|_2^2, where 22\|\cdot\|_2^2 is the squared Hilbert--Schmidt norm. This is done in a "high-complexity" asymptotic framework in which the so called effective rank r(Σ):=tr(Σ)Σ{\bf r}(\Sigma):=\frac{{\rm tr}(\Sigma)}{\|\Sigma\|_{\infty}} (tr(){\rm tr}(\cdot) being the trace and \|\cdot\|_{\infty} being the operator norm) of the true covariance Σ\Sigma is becoming large simultaneously with the sample size n,n, but r(Σ)=o(n){\bf r}(\Sigma)=o(n) as n.n\to\infty. In this setting, we prove that, in the case of one-dimensional spectral projector Pr,P_r, the properly centered and normalized statistic P^rPr22\|\hat P_r-P_r\|_2^2 with {\it data-dependent} centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.

View on arXiv
Comments on this paper