The Importance of Being Lazy: Scaling Limits of Continual Learning

20 June 2025

Main:8 Pages

26 Figures

Bibliography:5 Pages

1 Tables

Appendix:29 Pages

Abstract

Despite recent efforts, neural networks still struggle to learn in non-stationary environments, and our understanding of catastrophic forgetting (CF) is far from complete. In this work, we perform a systematic study on the impact of model scale and the degree of feature learning in continual learning. We reconcile existing contradictory observations on scale in the literature, by differentiating between lazy and rich training regimes through a variable parameterization of the architecture. We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness. Using the framework of dynamical mean field theory, we then study the infinite width dynamics of the model in the feature learning regime and characterize CF, extending prior theoretical results limited to the lazy regime. We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks. We identify a transition modulated by task similarity where the model exits an effectively lazy regime with low forgetting to enter a rich regime with significant forgetting. Finally, our findings reveal that neural networks achieve optimal performance at a critical level of feature learning, which depends on task non-stationarity and transfers across model scales. This work provides a unified perspective on the role of scale and feature learning in continual learning.

View on arXiv

@article{graldi2025_2506.16884,
  title={ The Importance of Being Lazy: Scaling Limits of Continual Learning },
  author={ Jacopo Graldi and Alessandro Breccia and Giulia Lanzillotta and Thomas Hofmann and Lorenzo Noci },
  journal={arXiv preprint arXiv:2506.16884},
  year={ 2025 }
}

Comments on this paper