Double Descent Risk and Volume Saturation Effects: A Geometric Perspective

The appearance of the double-descent risk phenomenon has received growing interest in the machine learning and statistics community, as it challenges well-understood notions behind the U-shaped train-test curves. Motivated through Rissanen's minimum description length (MDL), Balasubramanian's Occam's Razor, and Amari's information geometry, we investigate how the logarithm of the model volume: , works to extend intuition behind the AIC and BIC model selection criteria. We find that for the particular model classes of isotropic linear regression and statistical lattices, the term may be decomposed into a sum of distinct components, each of which assist in their explanations of the appearance of this phenomenon. In particular they suggest why generalization error does not necessarily continue to grow with increasing model dimensionality.
View on arXiv