We consider deep neural networks (formally equivalent to sum-product networks \cite{PoDo11}), in which the output of each node is a quadratic function of its inputs. Similar to other deep architectures, these networks can compactly represent any function on a finite training set. The main goal of this paper is the derivation of a provably efficient, layer-by-layer, algorithm for training such networks, which we denote as the \emph{Basis Learner}. Unlike most, if not all, previous algorithms for training deep neural networks, our algorithm comes with formal polynomial time convergence guarantees. Moreover, the algorithm is a universal learner in the sense that the training error is guaranteed to decrease at every iteration, and can eventually reach zero under mild conditions. We present practical implementations of this algorithm, as well as preliminary but quite promising experimental results. We also compare our deep architecture to other shallow architectures for learning polynomials, in particular kernel learning.
View on arXiv