We show that the moment generating function of the Kullback-Leibler divergence between the empirical distribution of independent samples from a distribution over a finite alphabet of size (e.g. a multinomial distribution) and itself is no more than that of a gamma distribution with shape and rate . The resulting exponential concentration inequality becomes meaningful (less than 1) when the divergence is larger than , whereas the standard method of types bound requires , thus saving a factor of order in the standard regime of parameters where . Our proof proceeds via a simple reduction to the case of a binary alphabet (e.g. a binomial distribution), and has the property that improvements in the case of directly translate to improvements for general .
View on arXiv