EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images

Spatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiyuan Zhang, Xiaodan Zhang, Chen Quan, Tong Zhao, Wei Huo, Yuanchen Huang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11074731/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849239021039910912
author Qiyuan Zhang
Xiaodan Zhang
Chen Quan
Tong Zhao
Wei Huo
Yuanchen Huang
author_facet Qiyuan Zhang
Xiaodan Zhang
Chen Quan
Tong Zhao
Wei Huo
Yuanchen Huang
author_sort Qiyuan Zhang
collection DOAJ
description Spatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and frequency. However, existing methods face the following challenges: 1) traditional approaches rely on linear assumptions; 2) convolutional neural networks in deep learning struggle with capturing global context; 3) generative adversarial networks suffer from mode collapse; and 4) while Transformers excel at modeling global dependencies, they are computationally intensive. To overcome these limitations, we propose an efficient hierarchical Transformer-based spatiotemporal fusion method, named the efficient sparse Transformer spatiotemporal fusion model (EST-STFM). This is the first model to introduce a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse attention mechanism into spatiotemporal fusion for remote sensing. The EST-STFM consists of a feature extraction network and a multibranch feature fusion network. The extraction network includes the TopSparseNet (TSN) and a multibranch feedforward neural network (MFNN). The fusion network is built on the multibranch feature fusion block (MFFB), integrating multiple TSNs to combine multiscale features. TSN adopts a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse self-attention mechanism, which effectively reduces computational overhead while preserving critical local features, the MFNN improves multi-scale representation learning, and the MFFB improves the fusion process by integrating features of different resolutions and semantic levels through four independent attention branches. Experimental results on three public datasets demonstrate that the EST-STFM outperforms existing methods in fusion performance. The effectiveness of each module is validated through ablation studies, while the model&#x2019;s robustness and practical utility are further confirmed through efficiency analysis and a clustering task.
format Article
id doaj-art-4ee024b1649d44dda4d5c5f8a09476fe
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-4ee024b1649d44dda4d5c5f8a09476fe2025-08-20T04:01:16ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118186331865510.1109/JSTARS.2025.358743911074731EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing ImagesQiyuan Zhang0https://orcid.org/0009-0006-1180-6669Xiaodan Zhang1https://orcid.org/0000-0002-4475-8011Chen Quan2Tong Zhao3Wei Huo4Yuanchen Huang5School of Computer Technology and Application, Qinghai University, Xining, ChinaSchool of Computer Technology and Application, Qinghai University, Xining, ChinaQinghai Provincial Institute of Meteorological Sciences, Xining, ChinaQinghai Provincial Institute of Meteorological Sciences, Xining, ChinaSchool of Computer Technology and Application, Qinghai University, Xining, ChinaSchool of Computer Technology and Application, Qinghai University, Xining, ChinaSpatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and frequency. However, existing methods face the following challenges: 1) traditional approaches rely on linear assumptions; 2) convolutional neural networks in deep learning struggle with capturing global context; 3) generative adversarial networks suffer from mode collapse; and 4) while Transformers excel at modeling global dependencies, they are computationally intensive. To overcome these limitations, we propose an efficient hierarchical Transformer-based spatiotemporal fusion method, named the efficient sparse Transformer spatiotemporal fusion model (EST-STFM). This is the first model to introduce a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse attention mechanism into spatiotemporal fusion for remote sensing. The EST-STFM consists of a feature extraction network and a multibranch feature fusion network. The extraction network includes the TopSparseNet (TSN) and a multibranch feedforward neural network (MFNN). The fusion network is built on the multibranch feature fusion block (MFFB), integrating multiple TSNs to combine multiscale features. TSN adopts a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse self-attention mechanism, which effectively reduces computational overhead while preserving critical local features, the MFNN improves multi-scale representation learning, and the MFFB improves the fusion process by integrating features of different resolutions and semantic levels through four independent attention branches. Experimental results on three public datasets demonstrate that the EST-STFM outperforms existing methods in fusion performance. The effectiveness of each module is validated through ablation studies, while the model&#x2019;s robustness and practical utility are further confirmed through efficiency analysis and a clustering task.https://ieeexplore.ieee.org/document/11074731/Multisource satelliteremote sensingspatiotemporal fusion (STF)Top-<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX">$K$</tex-math> </inline-formula> </named-content> </named-content> </named-content>Transformer
spellingShingle Qiyuan Zhang
Xiaodan Zhang
Chen Quan
Tong Zhao
Wei Huo
Yuanchen Huang
EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Multisource satellite
remote sensing
spatiotemporal fusion (STF)
Top-<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX">$K$</tex-math> </inline-formula> </named-content> </named-content> </named-content>
Transformer
title EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
title_full EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
title_fullStr EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
title_full_unstemmed EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
title_short EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
title_sort est stfm an efficient deep learning based spatiotemporal fusion method for remote sensing images
topic Multisource satellite
remote sensing
spatiotemporal fusion (STF)
Top-<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX">$K$</tex-math> </inline-formula> </named-content> </named-content> </named-content>
Transformer
url https://ieeexplore.ieee.org/document/11074731/
work_keys_str_mv AT qiyuanzhang eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages
AT xiaodanzhang eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages
AT chenquan eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages
AT tongzhao eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages
AT weihuo eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages
AT yuanchenhuang eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages