Exploring Text-Guided Single Image Editing for Remote Sensing Images
- DiffM

Artificial intelligence generative content (AIGC) has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning based editing methods generally involve two sequential stages: generation and this http URL natural images, these stages primarily rely on generative backbones pre-trained on large-scale benchmark datasets and text guidance facilitated by vision-language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect this http URL solve above problems, this paper proposes a text-guided RSI editing method and can be trained using only a single image. A multi-scale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pre-trained VLMs and prompt ensembling (PE) to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. Additionally, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality. Codes will be released at this https URL
View on arXiv@article{han2025_2405.05769, title={ Exploring Text-Guided Single Image Editing for Remote Sensing Images }, author={ Fangzhou Han and Lingyu Si and Zhizhuo Jiang and Hongwei Dong and Lamei Zhang and Yu Liu and Hao Chen and Bo Du }, journal={arXiv preprint arXiv:2405.05769}, year={ 2025 } }