Can SGD Learn Recurrent Neural Networks with Provable Generalization?

4 February 2019

Abstract

Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. However, due to the complexity raised by recurrent structure, they remain one of the least theoretically understood neural-network models. In particular, existing generalization bounds for RNNs mostly scale exponentially with the length of the input sequence, which limited their practical implications. In this paper, we show that if the input labels are (approximately) realizable by certain classes of (non-linear) functions of the input sequences, then using the vanilla stochastic gradient descent (SGD), RNNs can actually learn them $\textit{efficiently}$ , meaning that both the training time and sample complexity only scale $\textit{polynomially}$ with the input length (or almost polynomially, depending on the classes of non-linearity).

View on arXiv

Comments on this paper