GeoMM-SSL: Integrating Geospatial Object Relations in Multimodal Self-Supervised Learning for Semantic Segmentation of Remote Sensing Images

Self-supervised learning (SSL) has emerged as a promising approach for pretraining tasks by learning latent task-agnostic representations without labels. Currently, the pretrained SSL models for semantic segmentation of remote sensing images have attracted increasing attention. However, current pret...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang Liu, Tong Zhang, Yanru Huang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11095360/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Self-supervised learning (SSL) has emerged as a promising approach for pretraining tasks by learning latent task-agnostic representations without labels. Currently, the pretrained SSL models for semantic segmentation of remote sensing images have attracted increasing attention. However, current pretrained SSL models tend to focus solely on either global semantics or local spatial representation. In addition, existing pretrained SSL models ignore the spatial relationship among various objects, which could help infer co-occurrence between geospatial objects. In this article, we propose a multimodal pretrained SSL (GeoMM-SSL) framework that explicitly integrates geospatial object relations. The proposed framework includes a teacher-student framework with residual gated guidable attention units as the backbone, a multihead graph attention network that encodes prior knowledge of geospatial object relations, a multimodal representation fusion module that facilitates mutual learning between visual features of remote sensing images and topological features of geospatial object relations, and a multilevel loss function that performs multiple levels of evaluation, enabling the model to learn the data representation at the pixel, object, and global levels. We conducted comprehensive experiments to compare GeoMM-SSL with 13 existing SSL methods on ten public semantic segmentation datasets, and the results show that GeoMM-SSL achieves optimal results.
ISSN:1939-1404
2151-1535