How should we aggregate ratings? Accounting for personal rating scales via Wasserstein barycenters

A common method of making quantitative conclusions in qualitative situations is to collect numerical ratings on a linear scale. We investigate the problem of calculating aggregate numerical ratings from individual numerical ratings and propose a new, non-parametric model for the problem. We show that, with minimal modeling assumptions, the equal-weights average is inconsistent for estimating the quality of items. Analyzing the problem from the perspective of optimal transport, we derive an alternative rating estimator, which we show is asymptotically consistent almost surely and in for estimating quality, with an optimal rate of convergence. Further, we generalize Kendall's W, a non-parametric coefficient of preference concordance between raters, from the special case of rankings to the more general case of arbitrary numerical ratings. Along the way, we prove Glivenko--Cantelli-type theorems for uniform convergence of the cumulative distribution functions and quantile functions for Wasserstein-2 Fr\échet means on [0,1].
View on arXiv