Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment
Given a coarse-resolution remote sensing image on a prediction date as input, existing spatio-temporal fusion methods commonly use a pair of coarse and fine resolution images that are acquired close to the prediction date. These images serve as references to predict the corresponding fine-resolution...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10812681/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849393867001954304 |
|---|---|
| author | Xinlan Ding Huihui Song Xu Zhang |
| author_facet | Xinlan Ding Huihui Song Xu Zhang |
| author_sort | Xinlan Ding |
| collection | DOAJ |
| description | Given a coarse-resolution remote sensing image on a prediction date as input, existing spatio-temporal fusion methods commonly use a pair of coarse and fine resolution images that are acquired close to the prediction date. These images serve as references to predict the corresponding fine-resolution image. Recently, this paradigm has been shifted to be unpaired-reference based that only needs one flexible high-resolution image without date restriction as reference. Despite the flexibility of the reference, current work with this paradigm suffers from the following issue: due to the relatively long revisit period, drastic changes caused by cloud pollution or floods may occur in different periods. It results in large land cover changes of losing texture details and even semantic category shift (e.g., from the land to the water), which makes it difficult to obtain enough high-quality reference data, leading to severe model degradation. To solve the above problems, we propose the deformable global-local feature alignment network (DGFANet) for unpaired spatio-temporal fusion, which combines convolutional neural network and transformer to enhance texture and semantic details through global-local alignment. We design a feature alignment module to link the changed region with the surrounding stable region to obtain the global context information. Next, we perform feature fusion using the cross-communication mixture of experts module, which adaptively retains both local features and global representations. Finally, the color consistency loss is proposed to recover the color change of the fused image. In the experimental portion, DGFANet performs equally or demonstrates superior performance compared to the existing state-of-the-art methods in two widely recognized public datasets Colebly irrigated area and lower Gwydir catchment, which has been shown to do so in experimental studies. |
| format | Article |
| id | doaj-art-4a0a7ebfeb8c4fa3a415c32284523b07 |
| institution | Kabale University |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-4a0a7ebfeb8c4fa3a415c32284523b072025-08-20T03:40:15ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01187781779310.1109/JSTARS.2024.352141510812681Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature AlignmentXinlan Ding0https://orcid.org/0009-0005-6405-8695Huihui Song1https://orcid.org/0000-0002-7275-9871Xu Zhang2https://orcid.org/0009-0004-5500-134XJiangsu Key Laboratory of Big Data Analysis Technology (B-DAT), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B-DAT), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, ChinaSchool of Electronic Information Engineering, Suzhou Vocational University, Suzhou, ChinaGiven a coarse-resolution remote sensing image on a prediction date as input, existing spatio-temporal fusion methods commonly use a pair of coarse and fine resolution images that are acquired close to the prediction date. These images serve as references to predict the corresponding fine-resolution image. Recently, this paradigm has been shifted to be unpaired-reference based that only needs one flexible high-resolution image without date restriction as reference. Despite the flexibility of the reference, current work with this paradigm suffers from the following issue: due to the relatively long revisit period, drastic changes caused by cloud pollution or floods may occur in different periods. It results in large land cover changes of losing texture details and even semantic category shift (e.g., from the land to the water), which makes it difficult to obtain enough high-quality reference data, leading to severe model degradation. To solve the above problems, we propose the deformable global-local feature alignment network (DGFANet) for unpaired spatio-temporal fusion, which combines convolutional neural network and transformer to enhance texture and semantic details through global-local alignment. We design a feature alignment module to link the changed region with the surrounding stable region to obtain the global context information. Next, we perform feature fusion using the cross-communication mixture of experts module, which adaptively retains both local features and global representations. Finally, the color consistency loss is proposed to recover the color change of the fused image. In the experimental portion, DGFANet performs equally or demonstrates superior performance compared to the existing state-of-the-art methods in two widely recognized public datasets Colebly irrigated area and lower Gwydir catchment, which has been shown to do so in experimental studies.https://ieeexplore.ieee.org/document/10812681/Cross-communication mixture of expert (CMOE)cross attentiondeformable convolutionglobal-localloss of color consistency |
| spellingShingle | Xinlan Ding Huihui Song Xu Zhang Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Cross-communication mixture of expert (CMOE) cross attention deformable convolution global-local loss of color consistency |
| title | Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment |
| title_full | Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment |
| title_fullStr | Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment |
| title_full_unstemmed | Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment |
| title_short | Unpaired Spatio-Temporal Fusion for Remote Sensing Images via Deformable Global-Local Feature Alignment |
| title_sort | unpaired spatio temporal fusion for remote sensing images via deformable global local feature alignment |
| topic | Cross-communication mixture of expert (CMOE) cross attention deformable convolution global-local loss of color consistency |
| url | https://ieeexplore.ieee.org/document/10812681/ |
| work_keys_str_mv | AT xinlanding unpairedspatiotemporalfusionforremotesensingimagesviadeformablegloballocalfeaturealignment AT huihuisong unpairedspatiotemporalfusionforremotesensingimagesviadeformablegloballocalfeaturealignment AT xuzhang unpairedspatiotemporalfusionforremotesensingimagesviadeformablegloballocalfeaturealignment |