ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01347
117
0

Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

3 February 2025
Simone Bombari
Marco Mondelli
ArXivPDFHTML
Abstract

Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature xxx and a spurious feature yyy. Specifically, we quantify the amount of spurious correlations CCC learned via linear regression, in terms of the data covariance and the strength λ\lambdaλ of the ridge regularization. As a consequence, we first capture the simplicity of yyy through the spectrum of its covariance, and its correlation with xxx through the Schur complement of the full data covariance. Next, we prove a trade-off between CCC and the in-distribution test loss LLL, by showing that the value of λ\lambdaλ that minimizes LLL lies in an interval where CCC is increasing. Finally, we investigate the effects of over-parameterization via the random features model, by showing its equivalence to regularized linear regression. Our theoretical results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10 datasets.

View on arXiv
@article{bombari2025_2502.01347,
  title={ Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization },
  author={ Simone Bombari and Marco Mondelli },
  journal={arXiv preprint arXiv:2502.01347},
  year={ 2025 }
}
Comments on this paper