Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the estimation error with respect to the GD iterations, which is away from zero without a delicate scheme of early stopping. In turn, through a comprehensive analysis of -regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of estimation error can be achieved. Numerical experiments confirm our theory and further demonstrate that the regularization approach improves the training robustness and works for a wider range of neural networks.
View on arXiv