Estimation and Inference about Conditional Average Treatment Effect and Other Structural Functions

21 February 2017

Abstract

Our framework can be viewed as inference on low-dimensional nonparametric functions in the presence of high-dimensional nuisance function (where dimensionality refers to the number of covariates). Specifically, we consider the setting where we have a signal $Y=Y(\eta_0)$ that is an unbiased predictor of causal/structural objects like treatment effect, structural derivative, outcome given treatment, and others, conditional on a set of very high dimensional controls $Z$ . We are interested in simpler lower-dimensional nonparametric summaries of $Y$ , namely $g(x)=E[Y|X=x]$ conditional on a low-dimensional subset of covariates $X$ . The signal $Y=Y(\eta)$ depends on an unknown nuisance function $\eta_0(Z)$ . In the first stage, we need to learn the function $\eta_0(Z)$ using any machine learning method that is able to approximate $\eta$ accurately under very high dimensionality of $Z$ . For example, under approximate sparsity with respect to a dictionary, $\ell_1$ -penalized methods can be used; in others, tools such as deep neural networks can be used. To make the subsequent inference valid, we make the signal orthogonal to perturbations of $\eta$ . As a result, the second-stage low-dimensional nonparametric inference enjoys the quasi-oracle properties, as if we knew $\eta_0$ . In the second stage, we approximate the target function $g(x)$ by a linear form $p(x)'\beta_0$ , where $\beta_0$ is the Best Linear Predictor parameter. We develop a complete set of results about estimation and approximately Gaussian inference on $x \mapsto p(x)'\beta$ and $x \mapsto g(x)$ . If $p(x)$ is sufficiently rich and $g(x)$ admits a good approximation, then $g(x)$ gets automatically targeted by the inference; otherwise, the best linear approximation $p(x)'\beta$ to $g(x)$ gets targeted. When $p(x)$ is specified as a collection of group indicators, $p(x)'\beta$ describes group-average treatment effects (GATEs).

View on arXiv

Comments on this paper