527

Layer-Wise Multi-View Decoding for Natural Language Generation

Abstract

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem and substantially improve the performance of sequence-to-sequence learning with deep representations on diverse tasks, i.e., machine translation, abstractive summarization and image captioning. In particular, our approach surpasses the previous state-of-the-art models on three benchmark machine translation datasets.

View on arXiv
Comments on this paper