Moment-based Uniform Deviation Bounds for -means and Friends

Suppose centers are fit to points by heuristically minimizing the -means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with bounded moments; in particular, the difference between the sample cost and distribution cost decays with and as . The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of -means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for -means instances possessing some cluster structure.
View on arXiv