5
0

A Simplified Analysis of SGD for Linear Regression with Weight Averaging

Alexandru Meterez
Depen Morwani
Costin-Andrei Oncescu
Jingfeng Wu
Cengiz Pehlevan
Sham Kakade
Main:15 Pages
Bibliography:2 Pages
Abstract

Theoretically understanding stochastic gradient descent (SGD) in overparameterized models has led to the development of several optimization algorithms that are widely used in practice today. Recent work by~\citet{zou2021benign} provides sharp rates for SGD optimization in linear regression using constant learning rate, both with and without tail iterate averaging, based on a bias-variance decomposition of the risk. In our work, we provide a simplified analysis recovering the same bias and variance bounds provided in~\citep{zou2021benign} based on simple linear algebra tools, bypassing the requirement to manipulate operators on positive semi-definite (PSD) matrices. We believe our work makes the analysis of SGD on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling, leading to improvements in the training of realistic models.

View on arXiv
@article{meterez2025_2506.15535,
  title={ A Simplified Analysis of SGD for Linear Regression with Weight Averaging },
  author={ Alexandru Meterez and Depen Morwani and Costin-Andrei Oncescu and Jingfeng Wu and Cengiz Pehlevan and Sham Kakade },
  journal={arXiv preprint arXiv:2506.15535},
  year={ 2025 }
}
Comments on this paper