285
153

Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

Abstract

Consider the standard Gaussian linear regression model Y=Xθ+ϵY= X\theta+\epsilon, where YRnY\in R^n is a response vector and XRn×pX \in R^{n\times p} is a design matrix. Numerous work have been devoted to building efficient estimators of θ\theta when pp is much larger than nn. In such a situation, a classical approach amounts to assume that θ\theta is approximately sparse. This paper studies the minimax risks of estimation and testing over classes of kk-sparse vectors θ\theta. These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of Xθ{\bf X}\theta), the inverse problem (estimation of θ\theta) and linear testing (testing Xθ=0{\bf X}\theta=0). Interestingly, an elbow effect occurs when the number of variables klog(p/k)k\log(p/k) becomes large compared to nn. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in a ultra-high dimensional setting. Moreover, we compute the minimax risks when the variance of the noise is unknown. The knowledge of this variance is shown to play a significant role in the optimal rates of estimation and testing.

View on arXiv
Comments on this paper