FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

27 March 2026

Jie Zhu

Yimin Tian

Boyang Li

Kehao Wu

Zhongzhi Liang

Junhui Li

Xianyin Zhang

Lifan Guo

Feng Chen

Yong Liu

Chi Zhang

AIFin

ELM

ArXiv (abs)PDF HTML HuggingFace (12 upvotes)Github (439★)

Main:7 Pages

5 Figures

Bibliography:2 Pages

3 Tables

Abstract

This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

View on arXiv

Comments on this paper