False Discovery Rate Control via Debiased Lasso

12 March 2018

Abstract

We consider the problem of variable selection in high-dimensional statistical models where the goal is to report a set of variables, out of many predictors $X_1, \dotsc, X_p$ , that are relevant to a response of interest. For linear high-dimensional model, where the number of parameters exceeds the number of samples $(p>n)$ , we propose a procedure for variables selection and prove that it controls the "directional" false discovery rate (FDR) below a pre-assigned significance level $q\in [0,1]$ . We further analyze the statistical power of our framework and show that for designs with subgaussian rows and a common precision matrix $\Omega\in\mathbb{R}^{p\times p}$ , if the minimum nonzero parameter $\theta_{\min}$ satisfies \sqrt{n} \theta_{\min} - \sigma \sqrt{2(\max_{i\in [p]}\Omega_{ii})\log\left(\frac{2p}{qs_0}\right)} \to \infty\,, then this procedure achieves asymptotic power one. Our framework is built upon the debiasing approach and assumes the standard condition $s_0 = o(\sqrt{n}/(\log p)^2)$ , where $s_0$ indicates the number of true positives among the $p$ features. Notably, this framework achieves exact directional FDR control without any assumption on the amplitude of unknown regression parameters, and does not require any knowledge of the distribution of covariates or the noise level. We test our method in synthetic and real data experiments to assess its performance and to corroborate our theoretical results.

View on arXiv

Comments on this paper