299

Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

Abstract

We study the behavior of stochastic gradient descent applied to Axb22min\|Ax -b \|_2^2 \rightarrow \min for invertible ARn×nA \in \mathbb{R}^{n \times n}. We show that there is an explicit constant cAc_{A} depending (mildly) on AA such that E Axk+1b22(1+cAAF2)Axkb222AF2AT(Axkb)22. \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T (Ax_k - b)\right\|^2_{2}. This is a curious inequality: when applied to a discretization of a partial differential equation like Δu=f-\Delta u = f, the last term measures the regularity of the residual ukuu_k - u in a higher Sobolev space than the remaining terms: if ukuu_k - u has large fourth derivatives (i.e. bi-Laplacian Δ2\Delta^2), then SGD will dramatically decrease the size of the second derivatives (i.e. Δ\Delta) of ukuu_k - u. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This implies a regularization phenomenon: an energy cascade from large singular values to small singular values acts as regularizer.

View on arXiv
Comments on this paper