19
42

Randomized incomplete UU-statistics in high dimensions

Abstract

This paper studies inference for the mean vector of a high-dimensional UU-statistic. In the era of Big Data, the dimension dd of the UU-statistic and the sample size nn of the observations tend to be both large, and the computation of the UU-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for UU-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete UU-statistics obtained by sampling fewer terms of the UU-statistic are attractive alternatives. In this paper, we introduce randomized incomplete UU-statistics with sparse weights whose computational cost can be made independent of the order of the UU-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete UU-statistics in high dimensions, namely in cases where the dimension dd is possibly much larger than the sample size nn, for both non-degenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete UU-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.

View on arXiv
Comments on this paper