83
12

Some Two-Step Procedures for Variable Selection in High-Dimensional Linear Regression

Abstract

We study the problem of high-dimensional variable selection via some two-step procedures. First we show that given some good initial estimator which is \ell_{\infty}-consistent but not necessarily variable selection consistent, we can apply the nonnegative Garrote, adaptive Lasso or hard-thresholding procedure to obtain a final estimator that is both estimation and variable selection consistent. Unlike the Lasso, our results do not require the irrepresentable condition which could fail easily even for moderate pnp_n (Zhao and Yu, 2007) and it also allows pnp_n to grow almost as fast as exp(n)\exp(n) (for hard-thresholding there is no restriction on pnp_n). We also study the conditions under which the Ridge regression can be used as an initial estimator. We show that under a relaxed identifiable condition, the Ridge estimator is \ell_{\infty}-consistent. Such a condition is usually satisfied when pnnp_n\le n and does not require the partial orthogonality between relevant and irrelevant covariates which is needed for the univariate regression in (Huang et al., 2008). Our numerical studies show that when using the Lasso or Ridge as initial estimator, the two-step procedures have a higher sparsity recovery rate than the Lasso or adaptive Lasso with univariate regression used in (Huang et al., 2008).

View on arXiv
Comments on this paper