ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18951
54
0

BnMMLU: Measuring Massive Multitask Language Understanding in Bengali

25 May 2025
Saman Sarker Joy
    ELM
ArXiv (abs)PDFHTML
Main:9 Pages
13 Figures
Bibliography:2 Pages
8 Tables
Appendix:7 Pages
Abstract

The Massive Multitask Language Understanding (MMLU) benchmark has been widely used to evaluate language models across various domains. However, existing MMLU datasets primarily focus on high-resource languages such as English, which leaves low-resource languages like Bengali underrepresented. In this paper, we introduce BnMMLU, a benchmark to evaluate the multitask language understanding capabilities of Bengali in language models. The dataset spans 23 domains, including science, humanities, mathematics and general knowledge and is structured in a multiple-choice format to assess factual knowledge, application-based problem-solving and reasoning abilities of language models. It consists of 138,949 question-option pairs. We benchmark several proprietary and open-source large language models (LLMs) on the BnMMLU test set. Additionally, we annotate the test set with three cognitive categories-factual knowledge, procedural application and reasoning-to gain deeper insights into model strengths and weaknesses across various cognitive tasks. The results reveal significant performance gaps, highlighting the need for improved pre-training and fine-tuning strategies tailored to Bengali data. We release the dataset and benchmark results to facilitate further research in this area.

View on arXiv
@article{joy2025_2505.18951,
  title={ BnMMLU: Measuring Massive Multitask Language Understanding in Bengali },
  author={ Saman Sarker Joy },
  journal={arXiv preprint arXiv:2505.18951},
  year={ 2025 }
}
Comments on this paper