10
15

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

Abstract

We prove that a single step of gradient decent over depth two network, with qq hidden neurons, starting from orthogonal initialization, can memorize Ω(dqlog4(d))\Omega\left(\frac{dq}{\log^4(d)}\right) independent and randomly labeled Gaussians in Rd\mathbb{R}^d. The result is valid for a large class of activation functions, which includes the absolute value.

View on arXiv
Comments on this paper