Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

27 July 2020

Abstract

We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$ . We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $\mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T (Ax_k - b)\right\|^2_{2}.$ This is a curious inequality: when applied to a discretization of a partial differential equation like $-\Delta u = f$ , the last term measures the regularity of the residual $u_k - u$ in a higher Sobolev space than the remaining terms: if $u_k - u$ has large fourth derivatives (i.e. bi-Laplacian $\Delta^2$ ), then SGD will dramatically decrease the size of the second derivatives (i.e. $\Delta$ ) of $u_k - u$ . For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This implies a regularization phenomenon: an energy cascade from large singular values to small singular values acts as regularizer.

View on arXiv

Comments on this paper