EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images
Spatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11074731/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849239021039910912 |
|---|---|
| author | Qiyuan Zhang Xiaodan Zhang Chen Quan Tong Zhao Wei Huo Yuanchen Huang |
| author_facet | Qiyuan Zhang Xiaodan Zhang Chen Quan Tong Zhao Wei Huo Yuanchen Huang |
| author_sort | Qiyuan Zhang |
| collection | DOAJ |
| description | Spatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and frequency. However, existing methods face the following challenges: 1) traditional approaches rely on linear assumptions; 2) convolutional neural networks in deep learning struggle with capturing global context; 3) generative adversarial networks suffer from mode collapse; and 4) while Transformers excel at modeling global dependencies, they are computationally intensive. To overcome these limitations, we propose an efficient hierarchical Transformer-based spatiotemporal fusion method, named the efficient sparse Transformer spatiotemporal fusion model (EST-STFM). This is the first model to introduce a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse attention mechanism into spatiotemporal fusion for remote sensing. The EST-STFM consists of a feature extraction network and a multibranch feature fusion network. The extraction network includes the TopSparseNet (TSN) and a multibranch feedforward neural network (MFNN). The fusion network is built on the multibranch feature fusion block (MFFB), integrating multiple TSNs to combine multiscale features. TSN adopts a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse self-attention mechanism, which effectively reduces computational overhead while preserving critical local features, the MFNN improves multi-scale representation learning, and the MFFB improves the fusion process by integrating features of different resolutions and semantic levels through four independent attention branches. Experimental results on three public datasets demonstrate that the EST-STFM outperforms existing methods in fusion performance. The effectiveness of each module is validated through ablation studies, while the model’s robustness and practical utility are further confirmed through efficiency analysis and a clustering task. |
| format | Article |
| id | doaj-art-4ee024b1649d44dda4d5c5f8a09476fe |
| institution | Kabale University |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-4ee024b1649d44dda4d5c5f8a09476fe2025-08-20T04:01:16ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118186331865510.1109/JSTARS.2025.358743911074731EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing ImagesQiyuan Zhang0https://orcid.org/0009-0006-1180-6669Xiaodan Zhang1https://orcid.org/0000-0002-4475-8011Chen Quan2Tong Zhao3Wei Huo4Yuanchen Huang5School of Computer Technology and Application, Qinghai University, Xining, ChinaSchool of Computer Technology and Application, Qinghai University, Xining, ChinaQinghai Provincial Institute of Meteorological Sciences, Xining, ChinaQinghai Provincial Institute of Meteorological Sciences, Xining, ChinaSchool of Computer Technology and Application, Qinghai University, Xining, ChinaSchool of Computer Technology and Application, Qinghai University, Xining, ChinaSpatiotemporal fusion methods address the limitation that a single satellite cannot simultaneously provide high spatial and temporal resolution imagery. By integrating images with different spatial and temporal characteristics, it is possible to generate remote sensing data with enhanced detail and frequency. However, existing methods face the following challenges: 1) traditional approaches rely on linear assumptions; 2) convolutional neural networks in deep learning struggle with capturing global context; 3) generative adversarial networks suffer from mode collapse; and 4) while Transformers excel at modeling global dependencies, they are computationally intensive. To overcome these limitations, we propose an efficient hierarchical Transformer-based spatiotemporal fusion method, named the efficient sparse Transformer spatiotemporal fusion model (EST-STFM). This is the first model to introduce a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse attention mechanism into spatiotemporal fusion for remote sensing. The EST-STFM consists of a feature extraction network and a multibranch feature fusion network. The extraction network includes the TopSparseNet (TSN) and a multibranch feedforward neural network (MFNN). The fusion network is built on the multibranch feature fusion block (MFFB), integrating multiple TSNs to combine multiscale features. TSN adopts a Top-<inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula> sparse self-attention mechanism, which effectively reduces computational overhead while preserving critical local features, the MFNN improves multi-scale representation learning, and the MFFB improves the fusion process by integrating features of different resolutions and semantic levels through four independent attention branches. Experimental results on three public datasets demonstrate that the EST-STFM outperforms existing methods in fusion performance. The effectiveness of each module is validated through ablation studies, while the model’s robustness and practical utility are further confirmed through efficiency analysis and a clustering task.https://ieeexplore.ieee.org/document/11074731/Multisource satelliteremote sensingspatiotemporal fusion (STF)Top-<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX">$K$</tex-math> </inline-formula> </named-content> </named-content> </named-content>Transformer |
| spellingShingle | Qiyuan Zhang Xiaodan Zhang Chen Quan Tong Zhao Wei Huo Yuanchen Huang EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Multisource satellite remote sensing spatiotemporal fusion (STF) Top-<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX">$K$</tex-math> </inline-formula> </named-content> </named-content> </named-content> Transformer |
| title | EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images |
| title_full | EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images |
| title_fullStr | EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images |
| title_full_unstemmed | EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images |
| title_short | EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images |
| title_sort | est stfm an efficient deep learning based spatiotemporal fusion method for remote sensing images |
| topic | Multisource satellite remote sensing spatiotemporal fusion (STF) Top-<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX">$K$</tex-math> </inline-formula> </named-content> </named-content> </named-content> Transformer |
| url | https://ieeexplore.ieee.org/document/11074731/ |
| work_keys_str_mv | AT qiyuanzhang eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages AT xiaodanzhang eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages AT chenquan eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages AT tongzhao eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages AT weihuo eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages AT yuanchenhuang eststfmanefficientdeeplearningbasedspatiotemporalfusionmethodforremotesensingimages |