MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global financial domain, evaluating LLMs across modalities (text, vision, audio) and linguistic settings (monolingual, bilingual, multilingual) on domain-specific tasks. We introduce two novel tasks, including PolyFiQA-Easy and PolyFiQA-Expert, the first multilingual financial benchmarks requiring models to perform complex reasoning over mixed-language inputs; and EnglishOCR and SpanishOCR, the first OCR-embedded financial QA tasks challenging models to extract and reason over information from visual-text financial documents. Moreover, we propose a dynamic, difficulty-aware selection mechanism and curate a compact, balanced benchmark rather than simple aggregation existing datasets. Extensive evaluation of 22 state-of-the-art models reveals that even the strongest models, despite their general multimodal and multilingual capabilities, struggle dramatically when faced with complex cross-lingual and multimodal tasks in financial domain. MultiFinBen is publicly released to foster transparent, reproducible, and inclusive progress in financial studies and applications.
View on arXiv@article{peng2025_2506.14028, title={ MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation }, author={ Xueqing Peng and Lingfei Qian and Yan Wang and Ruoyu Xiang and Yueru He and Yang Ren and Mingyang Jiang and Jeff Zhao and Huan He and Yi Han and Yun Feng and Yuechen Jiang and Yupeng Cao and Haohang Li and Yangyang Yu and Xiaoyu Wang and Penglei Gao and Shengyuan Lin and Keyi Wang and Shanshan Yang and Yilun Zhao and Zhiwei Liu and Peng Lu and Jerry Huang and Suyuchen Wang and Triantafillos Papadopoulos and Polydoros Giannouris and Efstathia Soufleri and Nuo Chen and Guojun Xiong and Zhiyang Deng and Yijia Zhao and Mingquan Lin and Meikang Qiu and Kaleb E Smith and Arman Cohan and Xiao-Yang Liu and Jimin Huang and Alejandro Lopez-Lira and Xi Chen and Junichi Tsujii and Jian-Yun Nie and Sophia Ananiadou and Qianqian Xie }, journal={arXiv preprint arXiv:2506.14028}, year={ 2025 } }