Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

Annual Conference Computational Learning Theory (COLT), 2019

8 February 2019

Abstract

In this work we provide an estimator for the covariance matrix of a heavy-tailed random vector. We prove that the proposed estimator $\widehat{\mathbf{S}}$ admits \textit{affine-invariant} bounds of the form $(1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S}$ in high probability, where $\mathbf{S}$ is the unknown covariance matrix, and $\preccurlyeq$ is the positive semidefinite order on symmetric matrices. The result only requires the existence of fourth-order moments, and allows for $\varepsilon = O(\sqrt{\kappa^4 d/n})$ where $\kappa^4$ is some measure of kurtosis of the distribution, $d$ is the dimensionality of the space, and $n$ is the sample size. More generally, we can allow for regularization with level~ $\lambda$ , then $\varepsilon$ depends on the degrees of freedom number which is generally smaller than $d$ . The computational cost of the proposed estimator is essentially~ $O(d^2 n + d^3)$ , comparable to the computational cost of the sample covariance matrix in the statistically interesting regime~ $n \gg d$ . Its applications to eigenvalue estimation with relative error and to ridge regression with heavy-tailed random design are discussed.

View on arXiv

Comments on this paper