Classification of Satellite Image Time Series and Aerial Images Based on Multiscale Fusion and Multilevel Supervision
A large variety of sensors can be used for monitoring processes on the Earth’s surface. Different sensors can capture complementary information of the same observed region. For instance, aerial images offer a high spatial resolution but at a low temporal resolution, whereas satellite image...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Copernicus Publications
2025-07-01
|
| Series: | ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences |
| Online Access: | https://isprs-annals.copernicus.org/articles/X-G-2025/477/2025/isprs-annals-X-G-2025-477-2025.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | A large variety of sensors can be used for monitoring processes on the Earth’s surface. Different sensors can capture complementary information of the same observed region. For instance, aerial images offer a high spatial resolution but at a low temporal resolution, whereas satellite image time series (SITS) capture temporal variations with a high repetition rate, e.g. seasonal changes, but with limited spatial resolution. This paper presents a method to jointly exploit the strengths of SITS and aerial images for land cover classification. In this context, it is a challenge to train a classifier given the large difference in resolutions. We utilise convolutions to extract spatial information and consider self-attention in the temporal dimension for SITS. Additionally, a multi-resolution supervision strategy is proposed, applying auxiliary losses at different stages of the SITS decoder to enhance feature learning. Features extracted from SITS data are fused via a cross attention module with features determined from aerial images at the same spatial resolution by a SegFormer network before predicting land cover at the geometrical resolution of the aerial image. We perform comparative experiments on an existing benchmark dataset, showing that the convolution- and attention-based fusion of a SITS from Sentinel-2 with aerial image improves the classification results by +1.9% in the <em>mean IoU</em> and +2% in the <em>OA</em> compared to a method based on aerial images only. |
|---|---|
| ISSN: | 2194-9042 2194-9050 |