Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space

We study the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion, where the observations per matrix entry are scalar-valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein metric. We demonstrate through simulations that our method (i) provides better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yields accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently supports heteroscedastic distributions. In addition, we demonstrate our method on a real-world dataset of quarterly earnings prediction distributions. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.
View on arXiv@article{feitelberg2025_2410.13112, title={ Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space }, author={ Jacob Feitelberg and Kyuseong Choi and Anish Agarwal and Raaz Dwivedi }, journal={arXiv preprint arXiv:2410.13112}, year={ 2025 } }