ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02295
9
0

QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation

2 June 2025
Ahmed Wasfy
Omer Nacar
Abdelakreem Elkhateb
Mahmoud Reda
Omar Elshehy
Adel Ammar
W. Boulila
    VLM
ArXivPDFHTML
Abstract

The inherent complexities of Arabic script; its cursive nature, diacritical marks (tashkeel), and varied typography, pose persistent challenges for Optical Character Recognition (OCR). We present Qari-OCR, a series of vision-language models derived from Qwen2-VL-2B-Instruct, progressively optimized for Arabic through iterative fine-tuning on specialized synthetic datasets. Our leading model, QARI v0.2, establishes a new open-source state-of-the-art with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts. Qari-OCR demonstrates superior handling of tashkeel, diverse fonts, and document layouts, alongside impressive performance on low-resolution images. Further explorations (QARI v0.3) showcase strong potential for structural document understanding and handwritten text. This work delivers a marked improvement in Arabic OCR accuracy and efficiency, with all models and datasets released to foster further research.

View on arXiv
@article{wasfy2025_2506.02295,
  title={ QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation },
  author={ Ahmed Wasfy and Omer Nacar and Abdelakreem Elkhateb and Mahmoud Reda and Omar Elshehy and Adel Ammar and Wadii Boulila },
  journal={arXiv preprint arXiv:2506.02295},
  year={ 2025 }
}
Comments on this paper