23
2
v1v2 (latest)

Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure

Abstract

Let X=(Xi)1in\mathbf{X} = (X_i)_{1\leq i \leq n} be an i.i.d. sample of square-integrable variables in Rd\mathbb{R}^d, \GB{with common expectation μ\mu and covariance matrix Σ\Sigma, both unknown.} We consider the problem of testing if μ\mu is η\eta-close to zero, i.e. μη\|\mu\| \leq \eta against μ(η+δ)\|\mu\| \geq (\eta + \delta); we also tackle the more general two-sample mean closeness (also known as {\em relevant difference}) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance δ\delta such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of μ2\|\mu\|^2 used a test statistic, and secondly for estimating the operator and Frobenius norms of Σ\Sigma coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension dd_* of the distribution, defined as d:=Σ22/Σ2d_* := \|\Sigma\|_2^2/\|\Sigma\|_\infty^2. In particular, for η=0\eta=0, the minimum separation distance is Θ(d14Σ/n){\Theta}( d_*^{\frac{1}{4}}\sqrt{\|\Sigma\|_\infty/n}), in contrast with the minimax estimation distance for μ\mu, which is Θ(de12Σ/n){\Theta}(d_e^{\frac{1}{2}}\sqrt{\|\Sigma\|_\infty/n}) (where de:=Σ1/Σd_e:=\|\Sigma\|_1/\|\Sigma\|_\infty). This generalizes a phenomenon spelled out in particular by Baraud (2002).

View on arXiv
Comments on this paper