40
219

Minimax rates of entropy estimation on large alphabets via best polynomial approximation

Yihong Wu
Pengkun Yang
Abstract

Consider the problem of estimating the Shannon entropy of a distribution over kk elements from nn independent samples. We show that the minimax mean-square error is within universal multiplicative constant factors of \Big(\frac{k }{n \log k}\Big)^2 + \frac{\log^2 k}{n} if nn exceeds a constant factor of klogk\frac{k}{\log k}; otherwise there exists no consistent estimator. This refines the recent result of Valiant-Valiant \cite{VV11} that the minimal sample size for consistent entropy estimation scales according to Θ(klogk)\Theta(\frac{k}{\log k}). The apparatus of best polynomial approximation plays a key role in both the construction of optimal estimators and, via a duality argument, the minimax lower bound.

View on arXiv
Comments on this paper