Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

Consider the standard Gaussian linear regression model , where is a response vector and is a design matrix. Numerous work have been devoted to building efficient estimators of when is much larger than . In such a situation, a classical approach amounts to assume that is approximately sparse. This paper studies the minimax risks of estimation and testing over classes of -sparse vectors . These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of ), the inverse problem (estimation of ) and linear testing (testing ). Interestingly, an elbow effect occurs when the number of variables becomes large compared to . Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in an ultra-high dimensional setting. Moreover, we compute the minimax risks when the variance of the noise is unknown. The knowledge of this variance is shown to play a significant role in the optimal rates of estimation and testing. All these minimax bounds provide a characterization of statistical problems that are so difficult so that no procedure can provide satisfying results.
View on arXiv