18
10

Double-estimation-friendly inference for high-dimensional misspecified models

Abstract

All models may be wrong -- but that is not necessarily a problem for inference. Consider the standard tt-test for the significance of a variable XX for predicting response YY whilst controlling for pp other covariates ZZ in a random design linear model. This yields correct asymptotic type~I error control for the null hypothesis that XX is conditionally independent of YY given ZZ under an \emph{arbitrary} regression model of YY on (X,Z)(X, Z), provided that a linear regression model for XX on ZZ holds. An analogous robustness to misspecification, which we term the "double-estimation-friendly" (DEF) property, also holds for Wald tests in generalised linear models, with some small modifications. In this expository paper we explore this phenomenon, and propose methodology for high-dimensional regression settings that respects the DEF property. We advocate specifying (sparse) generalised linear regression models for both YY and the covariate of interest XX; our framework gives valid inference for the conditional independence null if either of these hold. In the special case where both specifications are linear, our proposal amounts to a small modification of the popular debiased Lasso test. We also investigate constructing confidence intervals for the regression coefficient of XX via inverting our tests; these have coverage guarantees even in partially linear models where the contribution of ZZ to YY can be arbitrary. Numerical experiments demonstrate the effectiveness of the methodology.

View on arXiv
Comments on this paper