Exploring Text-Guided Single Image Editing for Remote Sensing Images

Artificial intelligence generative content has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning-based editing methods generally involve two sequential...

Full description

Saved in:
Bibliographic Details
Main Authors: Fangzhou Han, Lingyu Si, Hongwei Dong, Zhizhuo Jiang, Lamei Zhang, Hao Chen, Yu Liu, Bo Du
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11059760/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728141179748352
author Fangzhou Han
Lingyu Si
Hongwei Dong
Zhizhuo Jiang
Lamei Zhang
Hao Chen
Yu Liu
Bo Du
author_facet Fangzhou Han
Lingyu Si
Hongwei Dong
Zhizhuo Jiang
Lamei Zhang
Hao Chen
Yu Liu
Bo Du
author_sort Fangzhou Han
collection DOAJ
description Artificial intelligence generative content has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning-based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pretrained on large-scale benchmark datasets and text guidance facilitated by vision–language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above-mentioned problems, this article proposes a text-guided RSI editing method and can be trained using only a single image. A multiscale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pretrained VLMs and prompt ensembling to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. In addition, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality.
format Article
id doaj-art-0c3dcb4075de449188110b485ad28cb3
institution DOAJ
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-0c3dcb4075de449188110b485ad28cb32025-08-20T03:09:37ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118181171813310.1109/JSTARS.2025.358441811059760Exploring Text-Guided Single Image Editing for Remote Sensing ImagesFangzhou Han0https://orcid.org/0009-0005-0709-9503Lingyu Si1https://orcid.org/0000-0002-7735-6676Hongwei Dong2https://orcid.org/0000-0003-2629-2892Zhizhuo Jiang3https://orcid.org/0000-0002-5269-2753Lamei Zhang4https://orcid.org/0000-0002-3595-0001Hao Chen5https://orcid.org/0000-0002-1837-3986Yu Liu6https://orcid.org/0000-0002-5216-3181Bo Du7https://orcid.org/0000-0002-0059-8458Department of Information Engineering, Harbin Institute of Technology, Harbin, ChinaNational Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, National Key Laboratory of Information Systems Engineering, Beijing, ChinaNational Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, National Key Laboratory of Information Systems Engineering, Beijing, ChinaShenzhen International Graduate School, Tsinghua University, Shenzhen, ChinaDepartment of Information Engineering, Harbin Institute of Technology, Harbin, ChinaDepartment of Information Engineering, Harbin Institute of Technology, Harbin, ChinaShenzhen International Graduate School, Tsinghua University, Shenzhen, ChinaHubei Luojia Laboratory, National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, ChinaArtificial intelligence generative content has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning-based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pretrained on large-scale benchmark datasets and text guidance facilitated by vision–language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above-mentioned problems, this article proposes a text-guided RSI editing method and can be trained using only a single image. A multiscale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pretrained VLMs and prompt ensembling to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. In addition, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality.https://ieeexplore.ieee.org/document/11059760/Prompt ensembling (PE)remote sensing image (RSI) editingsingle image diffusiontext-guided image editing
spellingShingle Fangzhou Han
Lingyu Si
Hongwei Dong
Zhizhuo Jiang
Lamei Zhang
Hao Chen
Yu Liu
Bo Du
Exploring Text-Guided Single Image Editing for Remote Sensing Images
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Prompt ensembling (PE)
remote sensing image (RSI) editing
single image diffusion
text-guided image editing
title Exploring Text-Guided Single Image Editing for Remote Sensing Images
title_full Exploring Text-Guided Single Image Editing for Remote Sensing Images
title_fullStr Exploring Text-Guided Single Image Editing for Remote Sensing Images
title_full_unstemmed Exploring Text-Guided Single Image Editing for Remote Sensing Images
title_short Exploring Text-Guided Single Image Editing for Remote Sensing Images
title_sort exploring text guided single image editing for remote sensing images
topic Prompt ensembling (PE)
remote sensing image (RSI) editing
single image diffusion
text-guided image editing
url https://ieeexplore.ieee.org/document/11059760/
work_keys_str_mv AT fangzhouhan exploringtextguidedsingleimageeditingforremotesensingimages
AT lingyusi exploringtextguidedsingleimageeditingforremotesensingimages
AT hongweidong exploringtextguidedsingleimageeditingforremotesensingimages
AT zhizhuojiang exploringtextguidedsingleimageeditingforremotesensingimages
AT lameizhang exploringtextguidedsingleimageeditingforremotesensingimages
AT haochen exploringtextguidedsingleimageeditingforremotesensingimages
AT yuliu exploringtextguidedsingleimageeditingforremotesensingimages
AT bodu exploringtextguidedsingleimageeditingforremotesensingimages