Stochastic Gradient Descent applied to Least Squares regularizes in
Sobolev spaces
Abstract
We study the behavior of stochastic gradient descent applied to for invertible . We show that there is an explicit constant depending (mildly) on such that This is a curious inequality: when applied to a discretization of a partial differential equation like , the last term measures the regularity of the residual in a higher Sobolev space than the remaining terms: if has large fourth derivatives (i.e. bi-Laplacian ), then SGD will dramatically decrease the size of the second derivatives (i.e. ) of . For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This implies a regularization phenomenon: an energy cascade from large singular values to small singular values acts as regularizer.
View on arXivComments on this paper
