71
9
v1v2 (latest)

Quantitative asymptotics of graphical projection pursuit

Abstract

There is a result of Diaconis and Freedman which says that, in a limiting sense, for large collections of high-dimensional data most one-dimensional projections of the data are approximately Gaussian. This paper gives quantitative versions of that result. For a set of deterministic vectors {xi}i=1n\{x_i\}_{i=1}^n in Rd\R^d with nn and dd fixed, let θ\sd1\theta\in\s^{d-1} be a random point of the sphere and let μnθ\mu_n^\theta denote the random measure which puts mass 1n\frac{1}{n} at each of the points \inprodx1θ,...,\inprodxnθ\inprod{x_1}{\theta},...,\inprod{x_n}{\theta}. For a fixed bounded Lipschitz test function ff, ZZ a standard Gaussian random variable and σ2\sigma^2 a suitable constant, an explicit bound is derived for the quantity \ds[fdμnθ\Ef(σZ)>ϵ]\ds\P[|\int f d\mu_n^\theta-\E f(\sigma Z)|>\epsilon]. A bound is also given for \ds[dBL(μnθ,N(0,σ2))>ϵ]\ds\P[d_{BL}(\mu_n^\theta, N(0,\sigma^2))>\epsilon], where dBLd_{BL} denotes the bounded-Lipschitz distance, which yields a lower bound on the waiting time to finding a non-Gaussian projection of the {xi}\{x_i\} if directions are tried independently and uniformly on \sd1\s^{d-1}.

View on arXiv
Comments on this paper