ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19126
45
0

MMATH: A Multilingual Benchmark for Mathematical Reasoning

25 May 2025
Wenyang Luo
Wayne Xin Zhao
Jing Sha
Shijin Wang
Ji-Rong Wen
Author Contacts:
wenyang_luo@outlook.combatmanfly@gmail.comjingsha@iflytek.comsjwang3@iflytek.com
    ReLMLRM
ArXiv (abs)PDFHTML
Main:8 Pages
11 Figures
Bibliography:2 Pages
15 Tables
Appendix:6 Pages
Abstract

The advent of large reasoning models, such as OpenAI o1 and DeepSeek R1, has significantly advanced complex reasoning tasks. However, their capabilities in multilingual complex reasoning remain underexplored, with existing efforts largely focused on simpler tasks like MGSM. To address this gap, we introduce MMATH, a benchmark for multilingual complex reasoning spanning 374 high-quality math problems across 10 typologically diverse languages. Using MMATH, we observe that even advanced models like DeepSeek R1 exhibit substantial performance disparities across languages and suffer from a critical off-target issue-generating responses in unintended languages. To address this, we explore strategies including prompting and training, demonstrating that reasoning in English and answering in target languages can simultaneously enhance performance and preserve target-language consistency. Our findings offer new insights and practical strategies for advancing the multilingual reasoning capabilities of large language models. Our code and data could be found atthis https URL.

View on arXiv
@article{luo2025_2505.19126,
  title={ MMATH: A Multilingual Benchmark for Mathematical Reasoning },
  author={ Wenyang Luo and Wayne Xin Zhao and Jing Sha and Shijin Wang and Ji-Rong Wen },
  journal={arXiv preprint arXiv:2505.19126},
  year={ 2025 }
}
Comments on this paper