285
153

Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

Abstract

Consider the standard linear regression model Y=Xθ+ϵY=X\theta+\epsilon, where YRnY\in R^n is a response vector, XRn×pX\in R^{n\times p} is a design matrix, θRp\theta\in R^p is the unknown regression vector, and ϵN(0p,σ2Ip)\epsilon\sim N(0_p,\sigma^2I_p) is a Gaussian noise. Numerous work have been devoted to building efficient estimators of θ\theta when pp is much larger than nn. In such a situation, a classical approach amounts to assume that θ\theta is approximately sparse. This paper studies the minimax risks of estimation and testing over kk-sparse vectors θ\theta. These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of XθX\theta), the inverse problem (estimation of θ\theta) and linear testing (test of a linear hypothesis on θ\theta). Interestingly, an elbow effect occurs when the number of variables pp becomes larger than kexp(n/k)k\exp(n/k). Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in a ultra-high dimensional setting. Finally, the minimax risk are also studied under unknown variance σ2\sigma^2. The knowledge of σ2\sigma^2 is shown to play a significant role in the optimal rates of estimation and testing.

View on arXiv
Comments on this paper