DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.
View on arXiv@article{gu2025_2505.21969, title={ DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation }, author={ Tianjun Gu and Linfeng Li and Xuhong Wang and Chenghua Gong and Jingyu Gong and Zhizhong Zhang and Yuan Xie and Lizhuang Ma and Xin Tan }, journal={arXiv preprint arXiv:2505.21969}, year={ 2025 } }