In distributional semantics, the pointwise mutual information () weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative () at , also known as Positive (). In this paper, we investigate alternative ways of dealing with and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive (or both), we find that most of the encoded semantics and syntax come from positive , in contrast to which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel variants and grounding the popular measure.
View on arXiv