27
6

Statistical inference in sparse high-dimensional additive models

Abstract

In this paper we discuss the estimation of a nonparametric component f1f_1 of a nonparametric additive model Y=f1(X1)+...+fq(Xq)+ϵY=f_1(X_1) + ...+ f_q(X_q) + \epsilon. We allow the number qq of additive components to grow to infinity and we make sparsity assumptions about the number of nonzero additive components. We compare this estimation problem with that of estimating f1f_1 in the oracle model Z=f1(X1)+ϵZ= f_1(X_1) + \epsilon, for which the additive components f2,,fqf_2,\dots,f_q are known. We construct a two-step presmoothing-and-resmoothing estimator of f1f_1 and state finite-sample bounds for the difference between our estimator and some smoothing estimators f^1(oracle)\hat f_1^{\text{(oracle)}} in the oracle model. In an asymptotic setting these bounds can be used to show asymptotic equivalence of our estimator and the oracle estimators; the paper thus shows that, asymptotically, under strong enough sparsity conditions, knowledge of f2,,fqf_2,\dots,f_q has no effect on estimation accuracy. Our first step is to estimate f1f_1 with an undersmoothed estimator based on near-orthogonal projections with a group Lasso bias correction. We then construct pseudo responses Y^\hat Y by evaluating a debiased modification of our undersmoothed estimator of f1f_1 at the design points. In the second step the smoothing method of the oracle estimator f^1(oracle)\hat f_1^{\text{(oracle)}} is applied to a nonparametric regression problem with responses Y^\hat Y and covariates X1X_1. Our mathematical exposition centers primarily on establishing properties of the presmoothing estimator. We present simulation results demonstrating close-to-oracle performance of our estimator in practical applications.

View on arXiv
Comments on this paper