Exploring Text-Guided Single Image Editing for Remote Sensing Images
Artificial intelligence generative content has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning-based editing methods generally involve two sequential...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11059760/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849728141179748352 |
|---|---|
| author | Fangzhou Han Lingyu Si Hongwei Dong Zhizhuo Jiang Lamei Zhang Hao Chen Yu Liu Bo Du |
| author_facet | Fangzhou Han Lingyu Si Hongwei Dong Zhizhuo Jiang Lamei Zhang Hao Chen Yu Liu Bo Du |
| author_sort | Fangzhou Han |
| collection | DOAJ |
| description | Artificial intelligence generative content has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning-based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pretrained on large-scale benchmark datasets and text guidance facilitated by vision–language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above-mentioned problems, this article proposes a text-guided RSI editing method and can be trained using only a single image. A multiscale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pretrained VLMs and prompt ensembling to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. In addition, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality. |
| format | Article |
| id | doaj-art-0c3dcb4075de449188110b485ad28cb3 |
| institution | DOAJ |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-0c3dcb4075de449188110b485ad28cb32025-08-20T03:09:37ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118181171813310.1109/JSTARS.2025.358441811059760Exploring Text-Guided Single Image Editing for Remote Sensing ImagesFangzhou Han0https://orcid.org/0009-0005-0709-9503Lingyu Si1https://orcid.org/0000-0002-7735-6676Hongwei Dong2https://orcid.org/0000-0003-2629-2892Zhizhuo Jiang3https://orcid.org/0000-0002-5269-2753Lamei Zhang4https://orcid.org/0000-0002-3595-0001Hao Chen5https://orcid.org/0000-0002-1837-3986Yu Liu6https://orcid.org/0000-0002-5216-3181Bo Du7https://orcid.org/0000-0002-0059-8458Department of Information Engineering, Harbin Institute of Technology, Harbin, ChinaNational Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, National Key Laboratory of Information Systems Engineering, Beijing, ChinaNational Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, National Key Laboratory of Information Systems Engineering, Beijing, ChinaShenzhen International Graduate School, Tsinghua University, Shenzhen, ChinaDepartment of Information Engineering, Harbin Institute of Technology, Harbin, ChinaDepartment of Information Engineering, Harbin Institute of Technology, Harbin, ChinaShenzhen International Graduate School, Tsinghua University, Shenzhen, ChinaHubei Luojia Laboratory, National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, ChinaArtificial intelligence generative content has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning-based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pretrained on large-scale benchmark datasets and text guidance facilitated by vision–language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above-mentioned problems, this article proposes a text-guided RSI editing method and can be trained using only a single image. A multiscale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pretrained VLMs and prompt ensembling to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. In addition, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality.https://ieeexplore.ieee.org/document/11059760/Prompt ensembling (PE)remote sensing image (RSI) editingsingle image diffusiontext-guided image editing |
| spellingShingle | Fangzhou Han Lingyu Si Hongwei Dong Zhizhuo Jiang Lamei Zhang Hao Chen Yu Liu Bo Du Exploring Text-Guided Single Image Editing for Remote Sensing Images IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Prompt ensembling (PE) remote sensing image (RSI) editing single image diffusion text-guided image editing |
| title | Exploring Text-Guided Single Image Editing for Remote Sensing Images |
| title_full | Exploring Text-Guided Single Image Editing for Remote Sensing Images |
| title_fullStr | Exploring Text-Guided Single Image Editing for Remote Sensing Images |
| title_full_unstemmed | Exploring Text-Guided Single Image Editing for Remote Sensing Images |
| title_short | Exploring Text-Guided Single Image Editing for Remote Sensing Images |
| title_sort | exploring text guided single image editing for remote sensing images |
| topic | Prompt ensembling (PE) remote sensing image (RSI) editing single image diffusion text-guided image editing |
| url | https://ieeexplore.ieee.org/document/11059760/ |
| work_keys_str_mv | AT fangzhouhan exploringtextguidedsingleimageeditingforremotesensingimages AT lingyusi exploringtextguidedsingleimageeditingforremotesensingimages AT hongweidong exploringtextguidedsingleimageeditingforremotesensingimages AT zhizhuojiang exploringtextguidedsingleimageeditingforremotesensingimages AT lameizhang exploringtextguidedsingleimageeditingforremotesensingimages AT haochen exploringtextguidedsingleimageeditingforremotesensingimages AT yuliu exploringtextguidedsingleimageeditingforremotesensingimages AT bodu exploringtextguidedsingleimageeditingforremotesensingimages |