34
1

A New Robust Partial pp-Wasserstein-Based Metric for Comparing Distributions

Abstract

The 22-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the 22-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical 22-Wasserstein distance on nn samples in R2\mathbb{R}^2 to converge to the true distance at a rate of n1/4n^{-1/4}, which is significantly slower than the rate of n1/2n^{-1/2} for 11-Wasserstein distance. We introduce a new family of distances parameterized by k0k \ge 0, called kk-RPW that is based on computing the partial 22-Wasserstein distance. We show that (1) kk-RPW satisfies the metric properties, (2) kk-RPW is robust to small outlier mass while retaining the sensitivity of 22-Wasserstein distance to minor geometric differences, and (3) when kk is a constant, kk-RPW distance between empirical distributions on nn samples in R2\mathbb{R}^2 converges to the true distance at a rate of n1/3n^{-1/3}, which is faster than the convergence rate of n1/4n^{-1/4} for the 22-Wasserstein distance. Using the partial pp-Wasserstein distance, we extend our distance to any p[1,]p \in [1,\infty]. By setting parameters kk or pp appropriately, we can reduce our distance to the total variation, pp-Wasserstein, and the L\évy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the 11-Wasserstein, 22-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

View on arXiv
Comments on this paper