17
16

Dimension-agnostic inference using cross U-statistics

Abstract

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension dd while letting the sample size nn increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where dd and nn both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming ndn \gg d, or d/n0.2d/n \approx 0.2? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on dd versus nn. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how dd scales with nn. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a 2\sqrt{2} factor.

View on arXiv
Comments on this paper