Feature selection and classification of high-dimensional normal vectors with possibly large number of classes

4 June 2015

Abstract

We consider high-dimensional multi-class classification of normal vectors, where unlike standard assumptions, the number of classes may be also large. We derive the (non-asymptotic) conditions on effects of significant features, and the low and the upper bounds for distances between classes required for successful feature selection and classification with a given accuracy. Furthermore, we study an asymptotic setup where the number of classes is growing with the dimension of feature space and sample sizes. To the best of our knowledge, our paper is the first to study this important model. In particular, we present an interesting and, at first glance, somewhat counter-intuitive phenomenon that the precision of classification can improve as the number of classes grows. This is due to more accurate feature selection since even weak significant features, which are not sufficiently strong to be manifested in a coarse classification, can nevertheless have a strong impact when the number of classes is large. We consider both the case of the known and the unknown covariance matrix. The performance of the procedure is demonstrated on simulated and real-data examples.

View on arXiv

Comments on this paper