Fuzzy paraphrases in learning word representations with a corpus and a lexicon
We figure out a trap that is not carefully addressed in the previous works using lexicons or ontologies to train or improve distributed word representations: For polysemantic words and utterances changing meaning in different contexts, their paraphrases or related entities in a lexicon or an ontology are unreliable and sometimes deteriorate the learning of word representations. Thus, we propose an approach to address the problem that considers each paraphrase of a word in a lexicon not fully a paraphrase, but a fuzzy member (i.e., fuzzy paraphrase) in the paraphrase set whose degree of truth (i.e., membership) depends on the contexts. Then we propose an efficient method to use the fuzzy paraphrases to learn word embeddings. We approximately estimate the local membership of paraphrases, and train word embeddings using a lexicon jointly by replacing the words in the contexts with their paraphrases randomly subject to the membership of each paraphrase. The experimental results show that our method is efficient, overcomes the weakness of the previous related works in extracting semantic information and outperforms the previous works of learning word representations using lexicons.
View on arXiv