ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11907
2
0

Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?

17 May 2025
Zihao Dongfang
Xu Zheng
Ziqiao Weng
Y. Lyu
Danda Pani Paudel
Luc Van Gool
Kailun Yang
Xuming Hu
    LRM
ArXivPDFHTML
Abstract

The 180x360 omnidirectional field of view captured by 360-degree cameras enables their use in a wide range of applications such as embodied AI and virtual reality. Although recent advances in multimodal large language models (MLLMs) have shown promise in visual-spatial reasoning, most studies focus on standard pinhole-view images, leaving omnidirectional perception largely unexplored. In this paper, we ask: Are MLLMs ready for omnidirectional spatial reasoning? To investigate this, we introduce OSR-Bench, the first benchmark specifically designed for this setting. OSR-Bench includes over 153,000 diverse question-answer pairs grounded in high-fidelity panoramic indoor scene maps. It covers key reasoning types including object counting, relative distance, and direction. We also propose a negative sampling strategy that inserts non-existent objects into prompts to evaluate hallucination and grounding robustness. For fine-grained analysis, we design a two-stage evaluation framework assessing both cognitive map generation and QA accuracy using rotation-invariant matching and a combination of rule-based and LLM-based metrics. We evaluate eight state-of-the-art MLLMs, including GPT-4o, Gemini 1.5 Pro, and leading open-source models under zero-shot settings. Results show that current models struggle with spatial reasoning in panoramic contexts, highlighting the need for more perceptually grounded MLLMs. OSR-Bench and code will be released at:this https URL

View on arXiv
@article{dongfang2025_2505.11907,
  title={ Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning? },
  author={ Zihao Dongfang and Xu Zheng and Ziqiao Weng and Yuanhuiyi Lyu and Danda Pani Paudel and Luc Van Gool and Kailun Yang and Xuming Hu },
  journal={arXiv preprint arXiv:2505.11907},
  year={ 2025 }
}
Comments on this paper