We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are -Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk . The time complexity is polynomial in the input dimension and the sample size , but exponential in the quantity . These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin , then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin . As a consequence, the algorithm achieves arbitrary generalization error with sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability .
View on arXiv