A New Robust Partial -Wasserstein-Based Metric for Comparing Distributions

The -Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the -Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical -Wasserstein distance on samples in to converge to the true distance at a rate of , which is significantly slower than the rate of for -Wasserstein distance. We introduce a new family of distances parameterized by , called -RPW that is based on computing the partial -Wasserstein distance. We show that (1) -RPW satisfies the metric properties, (2) -RPW is robust to small outlier mass while retaining the sensitivity of -Wasserstein distance to minor geometric differences, and (3) when is a constant, -RPW distance between empirical distributions on samples in converges to the true distance at a rate of , which is faster than the convergence rate of for the -Wasserstein distance. Using the partial -Wasserstein distance, we extend our distance to any . By setting parameters or appropriately, we can reduce our distance to the total variation, -Wasserstein, and the L\évy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the -Wasserstein, -Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.
View on arXiv