Classification of Satellite Image Time Series and Aerial Images Based on Multiscale Fusion and Multilevel Supervision

A large variety of sensors can be used for monitoring processes on the Earth’s surface. Different sensors can capture complementary information of the same observed region. For instance, aerial images offer a high spatial resolution but at a low temporal resolution, whereas satellite image...

Full description

Saved in:
Bibliographic Details
Main Authors: H. Kanyamahanga, M. Dorozynski, F. Rottensteiner
Format: Article
Language:English
Published: Copernicus Publications 2025-07-01
Series:ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:https://isprs-annals.copernicus.org/articles/X-G-2025/477/2025/isprs-annals-X-G-2025-477-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A large variety of sensors can be used for monitoring processes on the Earth&rsquo;s surface. Different sensors can capture complementary information of the same observed region. For instance, aerial images offer a high spatial resolution but at a low temporal resolution, whereas satellite image time series (SITS) capture temporal variations with a high repetition rate, e.g. seasonal changes, but with limited spatial resolution. This paper presents a method to jointly exploit the strengths of SITS and aerial images for land cover classification. In this context, it is a challenge to train a classifier given the large difference in resolutions. We utilise convolutions to extract spatial information and consider self-attention in the temporal dimension for SITS. Additionally, a multi-resolution supervision strategy is proposed, applying auxiliary losses at different stages of the SITS decoder to enhance feature learning. Features extracted from SITS data are fused via a cross attention module with features determined from aerial images at the same spatial resolution by a SegFormer network before predicting land cover at the geometrical resolution of the aerial image. We perform comparative experiments on an existing benchmark dataset, showing that the convolution- and attention-based fusion of a SITS from Sentinel-2 with aerial image improves the classification results by +1.9% in the <em>mean IoU</em> and +2% in the <em>OA</em> compared to a method based on aerial images only.
ISSN:2194-9042
2194-9050