A New Method for Learning Deep Recurrent Neural Networks

A novel architecture of a recurrent neural network (RNN), integrated with a fully-connected deep neural network (DNN) as its feature extractor, is presented. This deep-RNN is equipped with both causal temporal prediction and non-causal look-ahead, via auto-regression (AR) and moving-average (MA), respectively. We describe a primal-dual training method that formulates learning RNNs as a formal optimization problem with an inequality constraint that guarantees stability of the network dynamics. Experimental results demonstrate the effectiveness of this new method, which achieves 18.86% phone recognition error on the TIMIT benchmark with the core test set. The results also show the ARMA version of the deep-RNN is more effective than the AR version and that using DNNs to provide high-level abstraction of the raw filter-bank speech data as the input to the RNN gives much lower recognition error than without using the DNN.
View on arXiv