Inference for high-dimensional nested regression

This essay concerns estimation of and inference for regression parameters under endogeneity of a high-dimensional set of regressors. Given a nested first-stage model for the endogenous regressors and a second-stage model for the response variable, we estimate the second-stage regression parameter using a two-stage lasso procedure. We show that our estimator achieves good performance with respect to estimation error and include a novel analysis of the compatibility condition in the context of the second-stage model. We also study an asymptotic linearization of the second-stage lasso estimator for which we derive asymptotic normality. Using the latter results, we construct valid confidence intervals for low-dimensional components of the high-dimensional regression vector. We complement our asymptotic theory with empirical studies, which demonstrate the relevance of our method in finite samples.
View on arXiv