Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

3 August 2010

Abstract

Consider the standard linear regression model $Y=X\theta+\epsilon$ , where $Y\in R^n$ is a response vector, $X\in R^{n\times p}$ is a design matrix, $\theta\in R^p$ is the unknown regression vector, and $\epsilon\sim N(0_p,\sigma^2I_p)$ is a Gaussian noise. Numerous work have been devoted to building efficient estimators of $\theta$ when $p$ is much larger than $n$ . In such a situation, a classical approach amounts to assume that $\theta$ is approximately sparse. This paper studies the minimax risks of estimation and testing over $k$ -sparse vectors $\theta$ . These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of $X\theta$ ), the inverse problem (estimation of $\theta$ ) and linear testing (test of a linear hypothesis on $\theta$ ). Interestingly, an elbow effect occurs when the number of variables $p$ becomes larger than $k\exp(n/k)$ . Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in a ultra-high dimensional setting. Finally, the minimax risk are also studied under unknown variance $\sigma^2$ . The knowledge of $\sigma^2$ is shown to play a significant role in the optimal rates of estimation and testing.

View on arXiv

Comments on this paper