$\ell_1$ -regularized Neural Networks are Improperly Learnable in Polynomial Time

13 October 2015

Abstract

We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has $k$ hidden layers and that the $\ell_1$ -norm of the incoming weights of any neuron is bounded by $L$ . We present a kernel-based method, such that with probability at least $1 - \delta$ , it learns a predictor whose generalization error is at most $\epsilon$ worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in $(1/\epsilon,\log(1/\delta),F(k,L))$ , where $F(k,L)$ is a function depending on $(k,L)$ and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.

View on arXiv

Comments on this paper

ℓ1\ell_1ℓ1​-regularized Neural Networks are Improperly Learnable in Polynomial Time

$\ell_1$ -regularized Neural Networks are Improperly Learnable in Polynomial Time