v1v2v3 (latest)

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

17 April 2021

ArXiv (abs)PDF HTML Github

Main:9 Pages

8 Figures

Bibliography:3 Pages

3 Tables

Appendix:1 Pages

Abstract

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.

View on arXiv

Comments on this paper