Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

Abstract
We prove that a single step of gradient decent over depth two network, with hidden neurons, starting from orthogonal initialization, can memorize independent and randomly labeled Gaussians in . The result is valid for a large class of activation functions, which includes the absolute value.
View on arXivComments on this paper