94
44

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

Abstract

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample and independence tests from the machine learning community. In this manuscript we prove that the two-sample statistics are special cases of the independence statistics via an auxiliary label vector, and the distance-based statistics are equivalent to the kernel-based statistics via a bijective transformation between metrics and kernels. The proposed bijection ensures sample equivalence for the biased, unbiased, and normalized statistics, and guarantees a positive definite kernel to a negative type semimetric and vice versa, among other properties. In other words, upon creating a proper label vector and setting the kernel or metric to be bijective of each other, running any of the four methods will yield the exact same testing result up to numerical precision. This deepens and unifies the understanding of interpoint comparison based methods, and enables a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

View on arXiv
Comments on this paper