SentiLR: Linguistic Knowledge Enhanced Language Representation for Sentiment Analysis

6 November 2019

Pei Ke

Abstract

Most of the existing pre-trained language representation models neglect to consider the linguistic knowledge of texts, whereas we argue that such knowledge can promote language understanding in various NLP tasks. To benefit the downstream tasks in sentiment analysis, we propose a novel language representation model called SentiLR, which introduces word-level linguistic knowledge including part-of-speech tag and prior sentiment polarity from SentiWordNet. During pre-training, we first acquire the prior sentiment polarity of each word by querying the SentiWordNet dictionary with its part-of-speech tag. Then, we devise a new pre-training task called label-aware masked language model consisting of two sub-tasks: 1) word knowledge recovering given the sentence-level label; 2) sentence-level label prediction with linguistic knowledge enhanced context. Experiments show that SentiLR achieves state-of-the-art performances on sentence-level / aspect-level sentiment analysis and sentiment-aware data augmentation.

View on arXiv

Comments on this paper