Information Theory of Penalized Likelihoods and its Statistical Implications

27 January 2014

Abstract

We extend the correspondence between two-stage coding procedures in data compression and penalized likelihood procedures in statistical estimation. Traditionally, this had required restriction to countable parameter spaces. We show how to extend this correspondence in the uncountable parameter case. Leveraging the description length interpretations of penalized likelihood procedures we devise new techniques to derive adaptive risk bounds of such procedures. We show that the existence of certain countable coverings of the parameter space implies adaptive risk bounds and thus our theory is quite general. We apply our techniques to illustrate risk bounds for $\ell_1$ type penalized procedures in canonical high dimensional statistical problems such as linear regression and Gaussian graphical Models. In the linear regression problem, we also demonstrate how the traditional $l_0$ penalty times $\frac{\log(n)}{2}$ plus lower order terms has a two stage description length interpretation and present risk bounds for this penalized likelihood procedure.

View on arXiv

Comments on this paper