118
v1v2v3 (latest)

OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

Akashah Shabbir
Muhammad Umer Sheikh
Muhammad Akhtar Munir
Hiyam Debary
Mustansar Fiaz
Muhammad Zaigham Zaheer
Paolo Fraccaro
Fahad Shahbaz Khan
Muhammad Haris Khan
Xiao Xiang Zhu
Salman Khan
Main:15 Pages
12 Figures
Bibliography:4 Pages
6 Tables
Appendix:7 Pages
Abstract

Recent progress in multimodal reasoning has enabled agents that interpret imagery, connect it with language, and execute structured analytical tasks. Extending these capabilities to remote sensing remains challenging, as models must reason over spatial scale, geographic structures, and multispectral indices while maintaining coherent multi-step logic. To address this gap, we introduce \textit{OpenEarthAgent}, a unified framework for tool-augmented geospatial reasoning trained on satellite imagery, natural-language queries, and structured reasoning traces. Beyond serving as a benchmark, OpenEarthAgent establishes a cohesive agentic architecture built around a unified executable tool registry and trajectory-based policy learning. The framework standardizes heterogeneous visual, spectral, GIS, and georeferenced raster operations under a consistent callable schema, enabling modular orchestration and deterministic execution. Training is performed via supervised fine-tuning on structured reasoning trajectories with deterministic replay validation to ensure executability and spatial correctness. The accompanying corpus comprises 14,538 training and 1,169 evaluation instances with over 107K reasoning steps, spanning urban, environmental, disaster, and infrastructure domains and incorporating GIS operations alongside index analyses such as NDVI, NBR, and NDBI. Grounded in explicit reasoning traces, the learned agent demonstrates structured reasoning, stable spatial understanding, and interpretable tool-driven behaviour across diverse EO scenarios. We report consistent improvements over a strong baseline and competitive performance against recent open and closed-source models. Our code and trained models will be publicly available.

View on arXiv
Comments on this paper