UPS delivers optimal phase diagram in high-dimensional variable selection

25 October 2010

Abstract

Consider a linear model $Y=X\beta+z$ , $z\sim N(0,I_n)$ . Here, $X=X_{n,p}$ , where both $p$ and $n$ are large, but $p>n$ . We model the rows of $X$ as i.i.d. samples from $N(0,\frac{1}{n}\Omega)$ , where $\Omega$ is a $p\times p$ correlation matrix, which is unknown to us but is presumably sparse. The vector $\beta$ is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation.

View on arXiv

Comments on this paper