50
19

Implicit bias of SGD in L2L_{2}-regularized linear DNNs: One-way jumps from high to low rank

Abstract

The L2L_{2}-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can be avoided since they do not fit the data, GD might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank minimum to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets B1B2BRB_{1}\subset B_{2}\subset\cdots\subset B_{R} so that BrB_{r} contains all minima of rank rr or less (and not more) that are absorbing for small enough ridge parameters λ\lambda and learning rates η\eta: SGD has prob. 0 of leaving BrB_{r}, and from any starting point there is a non-zero prob. for SGD to go in BrB_{r}.

View on arXiv
Comments on this paper