When do Models Generalize? A Perspective from Data-Algorithm Compatibility

One of the major open problems in machine learning theory is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent. In many scenarios, their failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. To address this shortcoming, we propose a concept named compatibility, which quantitatively characterizes generalization in a both data-relevant and algorithm-relevant manner. By considering the entire training trajectory and focusing on early-stopping iterates, compatibility fully exploits the algorithm information and therefore yields better generalization guarantees. We validate this by theoretically studying compatibility under the setting of overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility under such a setting. Our theoretical results show that in the sense of compatibility, generalization holds with significantly weaker restrictions on the problem instance than the previous last iterate analysis.
View on arXiv