ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13989
19
13

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

23 May 2023
Cheikh M. Bamba Dione
David Ifeoluwa Adelani
Peter Nabende
Jesujoba Oluwadara Alabi
Thapelo Sindane
Happy Buzaaba
Shamsuddeen Hassan Muhammad
Chris C. Emezue
Perez Ogayo
Anuoluwapo Aremu
Catherine Gitau
Derguene Mbaye
Jonathan Mukiibi
Blessing K. Sibanda
Bonaventure F. P. Dossou
Andiswa Bukula
Rooweither Mabuya
A. Tapo
Edwin Munkoh-Buabeng
V. M. Koagne
F. Kabore
Amelia Taylor
Godson Kalipe
Tebogo Macucwa
Vukosi Marivate
T. Gwadabe
Mboning Tchiaze Elvis
I. Onyenwe
G. Atindogbé
T. Adelani
Idris Akinade
Olanrewaju Samuel
M. Nahimana
Théogene Musabeyezu
Emile Niyomutabazi
Ester Chimhenga
Kudzai Gotosa
Patrick Mizha
Apelete Agbolo
Seydou T. Traoré
C. Uchechukwu
Aliyu Yusuf
M. Abdullahi
Dietrich Klakow
ArXivPDFHTML
Abstract

In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.

View on arXiv
Comments on this paper