ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.01891
39
1

MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems

27 February 2025
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
    LRM
ArXivPDFHTML
Abstract

Recent advances in large language models (LLMs) and vision-language models (LVLMs) have shown promise across many tasks, yet their scientific reasoning capabilities remain untested, particularly in multimodal settings. We present MMSciBench, a benchmark for evaluating mathematical and physical reasoning through text-only and text-image formats, with human-annotated difficulty levels, solutions with detailed explanations, and taxonomic mappings. Evaluation of state-of-the-art models reveals significant limitations, with even the best model achieving only \textbf{63.77\%} accuracy and particularly struggling with visual reasoning tasks. Our analysis exposes critical gaps in complex reasoning and visual-textual integration, establishing MMSciBench as a rigorous standard for measuring progress in multimodal scientific understanding. The code for MMSciBench is open-sourced at GitHub, and the dataset is available at Hugging Face.

View on arXiv
@article{ye2025_2503.01891,
  title={ MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems },
  author={ Xinwu Ye and Chengfan Li and Siming Chen and Xiangru Tang and Wei Wei },
  journal={arXiv preprint arXiv:2503.01891},
  year={ 2025 }
}
Comments on this paper