19
2

Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines

Abstract

Rhetoric, both spoken and written, involves not only content but also style. One common stylistic tool is parallelism\textit{parallelism}: the juxtaposition of phrases which have the same sequence of linguistic (e.g.\textit{e.g.}, phonological, syntactic, semantic) features. Despite the ubiquity of parallelism, the field of natural language processing has seldom investigated it, missing a chance to better understand the nature of the structure, meaning, and intent that humans convey. To address this, we introduce the task of rhetorical parallelism detection\textit{rhetorical parallelism detection}. We construct a formal definition of it; we provide one new Latin dataset and one adapted Chinese dataset for it; we establish a family of metrics to evaluate performance on it; and, lastly, we create baseline systems and novel sequence labeling schemes to capture it. On our strictest metric, we attain F1F_{1} scores of 0.400.40 and 0.430.43 on our Latin and Chinese datasets, respectively.

View on arXiv
Comments on this paper