ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.01031
41
0

NavBench: Probing Multimodal Large Language Models for Embodied Navigation

1 June 2025
Yanyuan Qiao
Haodong Hong
Wenqi Lyu
Dong An
Siqi Zhang
Yutong Xie
Xinyu Wang
Qi Wu
    LM&Ro
ArXiv (abs)PDFHTML
Main:9 Pages
8 Figures
Bibliography:5 Pages
4 Tables
Abstract

Multimodal Large Language Models (MLLMs) have demonstrated strong generalization in vision-language tasks, yet their ability to understand and act within embodied environments remains underexplored. We present NavBench, a benchmark to evaluate the embodied navigation capabilities of MLLMs under zero-shot settings. NavBench consists of two components: (1) navigation comprehension, assessed through three cognitively grounded tasks including global instruction alignment, temporal progress estimation, and local observation-action reasoning, covering 3,200 question-answer pairs; and (2) step-by-step execution in 432 episodes across 72 indoor scenes, stratified by spatial, cognitive, and execution complexity. To support real-world deployment, we introduce a pipeline that converts MLLMs' outputs into robotic actions. We evaluate both proprietary and open-source models, finding that GPT-4o performs well across tasks, while lighter open-source models succeed in simpler cases. Results also show that models with higher comprehension scores tend to achieve better execution performance. Providing map-based context improves decision accuracy, especially in medium-difficulty scenarios. However, most models struggle with temporal understanding, particularly in estimating progress during navigation, which may pose a key challenge.

View on arXiv
@article{qiao2025_2506.01031,
  title={ NavBench: Probing Multimodal Large Language Models for Embodied Navigation },
  author={ Yanyuan Qiao and Haodong Hong and Wenqi Lyu and Dong An and Siqi Zhang and Yutong Xie and Xinyu Wang and Qi Wu },
  journal={arXiv preprint arXiv:2506.01031},
  year={ 2025 }
}
Comments on this paper