Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces

In many conventional scientific investigations with high or ultra-high dimensional feature spaces, the relevant features, though sparse, are large in number compared with classical statistical problems, and the magnitude of their effects tapers off. It is reasonable to model the number of relevant features as a diverging sequence when sample size increases. In this article, we investigate the properties of the extended Bayes information criterion (EBIC) (Chen and Chen, 2008) for feature selection in linear regression models with diverging number of relevant features in high or ultra-high dimensional feature spaces. The selection consistency of the EBIC in this situation is established. The application of EBIC to feature selection is considered in a two-stage feature selection procedure. Simulation studies are conducted to demonstrate the performance of the EBIC together with the two-stage feature selection procedure in finite sample cases.
View on arXiv