On the symmetrical Kullback-Leibler Jeffreys centroids
Clustering histograms became an important ingredient of modern information processing thanks to the success of the bag-of-word modeling paradigm. Histogram clustering can be performed using the celebrated -means centroid-based algorithm. From the viewpoint of applications, it is usually required to deal with symmetric distances. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of centroids with respect to that distance. We first prove that the Jeffreys centroid can be expressed analytically in closed form using the Lambert function for {\em positive} histograms. We then show how to obtain a fast guaranteed tight approximation when dealing with {\em frequency} histograms. Finally, we conclude with some remarks on the -means histogram clustering.
View on arXiv