Knowledge-Rich BERT Embeddings for Readability Assessment

Recent Advances in Natural Language Processing (RANLP), 2021

15 June 2021

Joseph Marvin Imperial

ArXiv (abs)PDF HTML Github

Main:6 Pages

2 Figures

Bibliography:1 Pages

3 Tables

Appendix:1 Pages

Abstract

Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models through a joint-learning method combined with handcrafted linguistic features for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, and obtaining as high as 12.4% increase in F1 performance. We also show that the knowledge encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.

View on arXiv

Comments on this paper