23
2

Analysis of the expected L2L_2 error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

Abstract

Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical L2L_2 risk are universally consistent and achieve good rates of convergence. In this paper, we show that the regularization term is not necessary to obtain similar results. In the case of a suitably chosen initialization of the network, a suitable number of gradient descent steps, and a suitable step size we show that an estimate without a regularization term is universally consistent for bounded predictor variables. Additionally, we show that if the regression function is H\"older smooth with H\"older exponent 1/2p11/2 \leq p \leq 1, the L2L_2 error converges to zero with a convergence rate of approximately n1/(1+d)n^{-1/(1+d)}. Furthermore, in case of an interaction model, where the regression function consists of a sum of H\"older smooth functions with dd^* components, a rate of convergence is derived which does not depend on the input dimension dd.

View on arXiv
Comments on this paper