471

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Abstract

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than the baseline method over the documents in the r/ChangeMyView corpus. These improvements allow us to examine hypotheses in two settings with large documents: persuasive online arguments on r/ChangeMyView, and authorship attribution in the Australian High Court Judgment corpus. With FastKASSIM, we are able to show that more syntactically similar arguments tend to be more persuasive, and that syntax provides a key indicator of writing style.

View on arXiv
Comments on this paper