A Simplified Analysis of SGD for Linear Regression with Weight Averaging

18 June 2025

Alexandru Meterez

Depen Morwani

Costin-Andrei Oncescu

Jingfeng Wu

Cengiz Pehlevan

Sham Kakade

Author Contacts:

ameterez@g.harvard.edu dmorwani@g.harvard.edu

ArXiv (abs)PDF HTML

Main:15 Pages

Bibliography:2 Pages

Abstract

Theoretically understanding stochastic gradient descent (SGD) in overparameterized models has led to the development of several optimization algorithms that are widely used in practice today. Recent work by~\citet{zou2021benign} provides sharp rates for SGD optimization in linear regression using constant learning rate, both with and without tail iterate averaging, based on a bias-variance decomposition of the risk. In our work, we provide a simplified analysis recovering the same bias and variance bounds provided in~\citep{zou2021benign} based on simple linear algebra tools, bypassing the requirement to manipulate operators on positive semi-definite (PSD) matrices. We believe our work makes the analysis of SGD on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling, leading to improvements in the training of realistic models.

View on arXiv

@article{meterez2025_2506.15535,
  title={ A Simplified Analysis of SGD for Linear Regression with Weight Averaging },
  author={ Alexandru Meterez and Depen Morwani and Costin-Andrei Oncescu and Jingfeng Wu and Cengiz Pehlevan and Sham Kakade },
  journal={arXiv preprint arXiv:2506.15535},
  year={ 2025 }
}

Comments on this paper