Statistical inference in sparse high-dimensional additive models

24 March 2016

Abstract

In this paper we discuss the estimation of a nonparametric component $f_1$ of a nonparametric additive model $Y=f_1(X_1) + ...+ f_q(X_q) + \epsilon$ . We allow the number $q$ of additive components to grow to infinity and we make sparsity assumptions about the number of nonzero additive components. We compare this estimation problem with that of estimating $f_1$ in the oracle model $Z= f_1(X_1) + \epsilon$ , for which the additive components $f_2,\dots,f_q$ are known. We construct a two-step presmoothing-and-resmoothing estimator of $f_1$ and state finite-sample bounds for the difference between our estimator and some smoothing estimators $\hat f_1^{\text{(oracle)}}$ in the oracle model. In an asymptotic setting these bounds can be used to show asymptotic equivalence of our estimator and the oracle estimators; the paper thus shows that, asymptotically, under strong enough sparsity conditions, knowledge of $f_2,\dots,f_q$ has no effect on estimation accuracy. Our first step is to estimate $f_1$ with an undersmoothed estimator based on near-orthogonal projections with a group Lasso bias correction. We then construct pseudo responses $\hat Y$ by evaluating a debiased modification of our undersmoothed estimator of $f_1$ at the design points. In the second step the smoothing method of the oracle estimator $\hat f_1^{\text{(oracle)}}$ is applied to a nonparametric regression problem with responses $\hat Y$ and covariates $X_1$ . Our mathematical exposition centers primarily on establishing properties of the presmoothing estimator. We present simulation results demonstrating close-to-oracle performance of our estimator in practical applications.

View on arXiv

Comments on this paper