40
0

SAM: Semantic Attribute Modulated Language Modeling

Abstract

As a fundamental task in the natural language processing field, language modeling aims to estimate the distribution of the word sequences. However, the most existing algorithms have focused on the main texts while often ignoring the vastly-accessible semantic attributes of the documents, e.g., titles, authors, sentiments and tags. To address this issue, we propose Semantic Attribute Modulated (SAM) language modeling, a novel language modeling framework that incorporates the various semantic attributes. Attributes are selected automatically with an attribute attention mechanism. We build three text datasets with a diversity of semantic attributes. On the three text datasets, we empirically examine the language model perplexities of several typical corpora, and then demonstrate the superiority of our model with the different combinations of the attributes. Extensive qualitative results, including word semantic analysis, attention values and an interesting lyric generation, further demonstrate the effectiveness of our SAM method.

View on arXiv
Comments on this paper