ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.11183
46
1

Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation

14 March 2025
Leideng Shi
Juan Zhang
ArXivPDFHTML
Abstract

Referring remote sensing image segmentation (RRSIS) is a novel visual task in remote sensing images segmentation, which aims to segment objects based on a given text description, with great significance in practical application. Previous studies fuse visual and linguistic modalities by explicit feature interaction, which fail to effectively excavate useful multimodal information from dual-branch encoder. In this letter, we design a multimodal-aware fusion network (MAFN) to achieve fine-grained alignment and fusion between the two modalities. We propose a correlation fusion module (CFM) to enhance multi-scale visual features by introducing adaptively noise in transformer, and integrate cross-modal aware features. In addition, MAFN employs multi-scale refinement convolution (MSRC) to adapt to the various orientations of objects at different scales to boost their representation ability to enhances segmentation accuracy. Extensive experiments have shown that MAFN is significantly more effective than the state of the art on RRSIS-D datasets. The source code is available atthis https URL.

View on arXiv
@article{shi2025_2503.11183,
  title={ Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation },
  author={ Leideng Shi and Juan Zhang },
  journal={arXiv preprint arXiv:2503.11183},
  year={ 2025 }
}
Comments on this paper