53
59

The Chi-Square Test of Distance Correlation

Abstract

Distance correlation has gained much recent attention in the statistics and machine learning community: the sample statistic is straightforward to compute, works for any metric or kernel choice, and asymptotically equals zero if and only if independence. One major bottleneck is the testing process: the null distribution of distance correlation depends on the metric choice and marginal distributions, which cannot be easily estimated. To compute a p-value, the standard approach is to estimate the null distribution via permutation, which is very costly for large amount of data. In this paper, we propose a chi-square distribution to approximate the null distribution of the unbiased distance correlation. We prove that the chi-square distribution either equals or well-approximates the null distribution, and always upper tail dominates the null distribution. The resulting distance correlation chi-square test does not require any permutation nor parameter estimation, works with any strong negative type metric or characteristic kernel, is valid and universally consistent for testing independence, and enjoys a similar finite-sample testing power as the standard permutation test. For one-dimensional data using Euclidean distance, testing independence using distance correlation now runs in linear time complexity, rendering it comparable in speed to the Pearson correlation t-test. The results are supported and demonstrated via simulations.

View on arXiv
Comments on this paper