Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment

Given a coarse-resolution remote sensing image on a prediction date as input, existing spatio-temporal fusion methods commonly use a pair of coarse and fine resolution images that are acquired close to the prediction date. These images serve as references to predict the corresponding fine-resolution...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinlan Ding, Huihui Song, Xu Zhang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10812681/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849393867001954304
author Xinlan Ding
Huihui Song
Xu Zhang
author_facet Xinlan Ding
Huihui Song
Xu Zhang
author_sort Xinlan Ding
collection DOAJ
description Given a coarse-resolution remote sensing image on a prediction date as input, existing spatio-temporal fusion methods commonly use a pair of coarse and fine resolution images that are acquired close to the prediction date. These images serve as references to predict the corresponding fine-resolution image. Recently, this paradigm has been shifted to be unpaired-reference based that only needs one flexible high-resolution image without date restriction as reference. Despite the flexibility of the reference, current work with this paradigm suffers from the following issue: due to the relatively long revisit period, drastic changes caused by cloud pollution or floods may occur in different periods. It results in large land cover changes of losing texture details and even semantic category shift (e.g., from the land to the water), which makes it difficult to obtain enough high-quality reference data, leading to severe model degradation. To solve the above problems, we propose the deformable global-local feature alignment network (DGFANet) for unpaired spatio-temporal fusion, which combines convolutional neural network and transformer to enhance texture and semantic details through global-local alignment. We design a feature alignment module to link the changed region with the surrounding stable region to obtain the global context information. Next, we perform feature fusion using the cross-communication mixture of experts module, which adaptively retains both local features and global representations. Finally, the color consistency loss is proposed to recover the color change of the fused image. In the experimental portion, DGFANet performs equally or demonstrates superior performance compared to the existing state-of-the-art methods in two widely recognized public datasets Colebly irrigated area and lower Gwydir catchment, which has been shown to do so in experimental studies.
format Article
id doaj-art-4a0a7ebfeb8c4fa3a415c32284523b07
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-4a0a7ebfeb8c4fa3a415c32284523b072025-08-20T03:40:15ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01187781779310.1109/JSTARS.2024.352141510812681Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature AlignmentXinlan Ding0https://orcid.org/0009-0005-6405-8695Huihui Song1https://orcid.org/0000-0002-7275-9871Xu Zhang2https://orcid.org/0009-0004-5500-134XJiangsu Key Laboratory of Big Data Analysis Technology (B-DAT), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B-DAT), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, ChinaSchool of Electronic Information Engineering, Suzhou Vocational University, Suzhou, ChinaGiven a coarse-resolution remote sensing image on a prediction date as input, existing spatio-temporal fusion methods commonly use a pair of coarse and fine resolution images that are acquired close to the prediction date. These images serve as references to predict the corresponding fine-resolution image. Recently, this paradigm has been shifted to be unpaired-reference based that only needs one flexible high-resolution image without date restriction as reference. Despite the flexibility of the reference, current work with this paradigm suffers from the following issue: due to the relatively long revisit period, drastic changes caused by cloud pollution or floods may occur in different periods. It results in large land cover changes of losing texture details and even semantic category shift (e.g., from the land to the water), which makes it difficult to obtain enough high-quality reference data, leading to severe model degradation. To solve the above problems, we propose the deformable global-local feature alignment network (DGFANet) for unpaired spatio-temporal fusion, which combines convolutional neural network and transformer to enhance texture and semantic details through global-local alignment. We design a feature alignment module to link the changed region with the surrounding stable region to obtain the global context information. Next, we perform feature fusion using the cross-communication mixture of experts module, which adaptively retains both local features and global representations. Finally, the color consistency loss is proposed to recover the color change of the fused image. In the experimental portion, DGFANet performs equally or demonstrates superior performance compared to the existing state-of-the-art methods in two widely recognized public datasets Colebly irrigated area and lower Gwydir catchment, which has been shown to do so in experimental studies.https://ieeexplore.ieee.org/document/10812681/Cross-communication mixture of expert (CMOE)cross attentiondeformable convolutionglobal-localloss of color consistency
spellingShingle Xinlan Ding
Huihui Song
Xu Zhang
Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Cross-communication mixture of expert (CMOE)
cross attention
deformable convolution
global-local
loss of color consistency
title Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
title_full Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
title_fullStr Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
title_full_unstemmed Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
title_short Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
title_sort unpaired spatio temporal fusion for remote sensing images via deformable global local feature alignment
topic Cross-communication mixture of expert (CMOE)
cross attention
deformable convolution
global-local
loss of color consistency
url https://ieeexplore.ieee.org/document/10812681/
work_keys_str_mv AT xinlanding unpairedspatiotemporalfusionforremotesensingimagesviadeformablegloballocalfeaturealignment
AT huihuisong unpairedspatiotemporalfusionforremotesensingimagesviadeformablegloballocalfeaturealignment
AT xuzhang unpairedspatiotemporalfusionforremotesensingimagesviadeformablegloballocalfeaturealignment