ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.17007
76
0

Where am I? Cross-View Geo-localization with Natural Language Descriptions

22 December 2024
Junyan Ye
Honglin Lin
Leyan Ou
Dairong Chen
Zihao Wang
Zeang Sheng
Weijia Li
Weijia Li
ArXivPDFHTML
Abstract

Cross-view geo-localization identifies the locations of street-view images by matching them with geo-tagged satellite images or OSM. However, most existing studies focus on image-to-image retrieval, with fewer addressing text-guided retrieval, a task vital for applications like pedestrian navigation and emergency response. In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text descriptions. To support this task, we construct the CVG-Text dataset by collecting cross-view data from multiple cities and employing a scene text generation approach that leverages the annotation capabilities of Large Multimodal Models to produce high-quality scene text descriptions with localization details. Additionally, we propose a novel text-based retrieval localization method, CrossText2Loc, which improves recall by 10% and demonstrates excellent long-text retrieval capabilities. In terms of explainability, it not only provides similarity scores but also offers retrieval reasons. More information can be found atthis https URL.

View on arXiv
@article{ye2025_2412.17007,
  title={ Where am I? Cross-View Geo-localization with Natural Language Descriptions },
  author={ Junyan Ye and Honglin Lin and Leyan Ou and Dairong Chen and Zihao Wang and Qi Zhu and Conghui He and Weijia Li },
  journal={arXiv preprint arXiv:2412.17007},
  year={ 2025 }
}
Comments on this paper