27
20

Transport Dependency: Optimal Transport Based Dependency Measures

Abstract

Finding meaningful ways to measure the statistical dependency between random variables ξ\xi and ζ\zeta is a timeless statistical endeavor. In recent years, several novel concepts, like the distance covariance, have extended classical notions of dependency to more general settings. In this article, we propose and study an alternative framework that is based on optimal transport. The transport dependency τ0\tau \ge 0 applies to general Polish spaces and intrinsically respects metric properties. For suitable ground costs, independence is fully characterized by τ=0\tau = 0. Via proper normalization of τ\tau, three transport correlations ρα\rho_\alpha, ρ\rho_\infty, and ρ\rho_* with values in [0,1][0, 1] are defined. They attain the value 11 if and only if ζ=φ(ξ)\zeta = \varphi(\xi), where φ\varphi is an α\alpha-Lipschitz function for ρα\rho_\alpha, a measurable function for ρ\rho_\infty, or a multiple of an isometry for ρ\rho_*. The transport dependency can be estimated consistently by an empirical plug-in approach, but alternative estimators with the same convergence rate but significantly reduced computational costs are also proposed. Numerical results suggest that τ\tau robustly recovers dependency between data sets with different internal metric structures. The usage for inferential tasks, like transport dependency based independence testing, is illustrated on a data set from a cancer study.

View on arXiv
Comments on this paper