62
56

False Discovery Rate Control via Debiased Lasso

Abstract

We consider the problem of variable selection in high-dimensional statistical models where the goal is to report a set of variables, out of many predictors X1,,XpX_1, \dotsc, X_p, that are relevant to a response of interest. For linear high-dimensional model, where the number of parameters exceeds the number of samples (p>n)(p>n), we propose a procedure for variables selection and prove that it controls the "directional" false discovery rate (FDR) below a pre-assigned significance level q[0,1]q\in [0,1]. We further analyze the statistical power of our framework and show that for designs with subgaussian rows and a common precision matrix ΩRp×p\Omega\in\mathbb{R}^{p\times p}, if the minimum nonzero parameter θmin\theta_{\min} satisfies \sqrt{n} \theta_{\min} - \sigma \sqrt{2(\max_{i\in [p]}\Omega_{ii})\log\left(\frac{2p}{qs_0}\right)} \to \infty\,, then this procedure achieves asymptotic power one. Our framework is built upon the debiasing approach and assumes the standard condition s0=o(n/(logp)2)s_0 = o(\sqrt{n}/(\log p)^2), where s0s_0 indicates the number of true positives among the pp features. Notably, this framework achieves exact directional FDR control without any assumption on the amplitude of unknown regression parameters, and does not require any knowledge of the distribution of covariates or the noise level. We test our method in synthetic and real data experiments to assess its performance and to corroborate our theoretical results.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.