EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images

Spatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiyuan Zhang, Xiaodan Zhang, Chen Quan, Tong Zhao, Wei Huo, Yuanchen Huang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11074731/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and frequency. However, existing methods face the following challenges: 1) traditional approaches rely on linear assumptions; 2) convolutional neural networks in deep learning struggle with capturing global context; 3) generative adversarial networks suffer from mode collapse; and 4) while Transformers excel at modeling global dependencies, they are computationally intensive. To overcome these limitations, we propose an efficient hierarchical Transformer-based spatiotemporal fusion method, named the efficient sparse Transformer spatiotemporal fusion model (EST-STFM). This is the first model to introduce a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse attention mechanism into spatiotemporal fusion for remote sensing. The EST-STFM consists of a feature extraction network and a multibranch feature fusion network. The extraction network includes the TopSparseNet (TSN) and a multibranch feedforward neural network (MFNN). The fusion network is built on the multibranch feature fusion block (MFFB), integrating multiple TSNs to combine multiscale features. TSN adopts a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse self-attention mechanism, which effectively reduces computational overhead while preserving critical local features, the MFNN improves multi-scale representation learning, and the MFFB improves the fusion process by integrating features of different resolutions and semantic levels through four independent attention branches. Experimental results on three public datasets demonstrate that the EST-STFM outperforms existing methods in fusion performance. The effectiveness of each module is validated through ablation studies, while the model&#x2019;s robustness and practical utility are further confirmed through efficiency analysis and a clustering task.
ISSN:1939-1404
2151-1535