ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.08910
28
0

Behind Maya: Building a Multilingual Vision Language Model

13 May 2025
Nahid Alam
Karthik Reddy Kanjula
Surya Guthikonda
Timothy Chung
Bala Krishna S Vegesna
Abhipsha Das
Anthony Susevski
Ryan Sze-Yin Chan
S M Iftekhar Uddin
Shayekh Bin Islam
Roshan Santhosh
S. Aneja
Drishti Sharma
Chen Liu
Isha Chaturvedi
Genta Indra Winata
Ashvanth.S
Snehanshu Mukherjee
Alham Fikri Aji
    MLLM
    VLM
ArXivPDFHTML
Abstract

In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; and 2) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available atthis https URL.

View on arXiv
@article{alam2025_2505.08910,
  title={ Behind Maya: Building a Multilingual Vision Language Model },
  author={ Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth.S and Snehanshu Mukherjee and Alham Fikri Aji },
  journal={arXiv preprint arXiv:2505.08910},
  year={ 2025 }
}
Comments on this paper