v1v2 (latest)

Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation

12 February 2026

Shuo Lu

Jianjie Cheng

Yinuo Xu

Yongcan Yu

Lijun Sheng

Peijie Wang

Siru Jiang

Yongguan Hu

Run Ling

Yihua Shao

Ao Ma

Wei Feng

Lingxiao He

Meng Wang

Qianlong Xie

Xingxing Wang

Nicu Sebe

Ran He

Jian Liang

ReLM

LRM

ArXiv (abs)PDF HTML Github

Main:6 Pages

5 Figures

Bibliography:1 Pages

2 Tables

Abstract

Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available atthis https URL.

View on arXiv

Comments on this paper