19
101

1\ell_1-regularized Neural Networks are Improperly Learnable in Polynomial Time

Abstract

We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has kk hidden layers and that the 1\ell_1-norm of the incoming weights of any neuron is bounded by LL. We present a kernel-based method, such that with probability at least 1δ1 - \delta, it learns a predictor whose generalization error is at most ϵ\epsilon worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in (1/ϵ,log(1/δ),F(k,L))(1/\epsilon,\log(1/\delta),F(k,L)), where F(k,L)F(k,L) is a function depending on (k,L)(k,L) and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.

View on arXiv
Comments on this paper