Are LLMs Good Text Diacritizers? An Arabic and Yorùbá Case Study

13 June 2025

Main:6 Pages

7 Figures

Bibliography:2 Pages

4 Tables

Appendix:2 Pages

Abstract

We investigate the effectiveness of large language models (LLMs) for text diacritization in two typologically distinct languages: Arabic and Yoruba. To enable a rigorous evaluation, we introduce a novel multilingual dataset MultiDiac, with diverse samples that capture a range of diacritic ambiguities. We evaluate 14 LLMs varying in size, accessibility, and language coverage, and benchmark them against 6 specialized diacritization models. Additionally, we fine-tune four small open-source models using LoRA for Yoruba. Our results show that many off-the-shelf LLMs outperform specialized diacritization models for both Arabic and Yoruba, but smaller models suffer from hallucinations. Fine-tuning on a small dataset can help improve diacritization performance and reduce hallucination rates.

View on arXiv

@article{toyin2025_2506.11602,
  title={ Are LLMs Good Text Diacritizers? An Arabic and Yorùbá Case Study },
  author={ Hawau Olamide Toyin and Samar M. Magdy and Hanan Aldarmaki },
  journal={arXiv preprint arXiv:2506.11602},
  year={ 2025 }
}

Comments on this paper