42
3

Event-Arguments Extraction Corpus and Modeling using BERT for Arabic

Abstract

Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus (550550k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: agentagent, locationlocation, and datedate, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in 82.23%82.23\% KappaKappa score and 87.2%87.2\% F1F_1-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an F1F_1-score of 94.01%94.01\%. To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about 8080k tokens) called \testNLI and used it as a second test set, on which our approach achieved promising results (83.59%83.59\% F1F_1-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at {\small \url{https://sina.birzeit.edu/wojood}}

View on arXiv
Comments on this paper