We define and use a new inter-textual distance which, contrary to other common approaches, not only measures differences in occurrences of words, but in occurrences of multi-word phrases. We show that this distance may easily be calculated and use it for statistical analysis of some sample corpuses of genuine and fake scientific papers.
View on arXiv