MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images

The heterogeneity and complexity of multimodal data in high-resolution remote sensing images significantly challenges existing cross-modal networks in fusing the complementary information of high-resolution optical and synthetic aperture radar (SAR) images for precise semantic segmentation. To addre...

Full description

Saved in:
Bibliographic Details
Main Authors: Kan Wei, JinKun Dai, Danfeng Hong, Yuanxin Ye
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:International Journal of Applied Earth Observations and Geoinformation
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1569843224005971
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The heterogeneity and complexity of multimodal data in high-resolution remote sensing images significantly challenges existing cross-modal networks in fusing the complementary information of high-resolution optical and synthetic aperture radar (SAR) images for precise semantic segmentation. To address this issue, this paper proposes a multi-layer perceptron (MLP) dominated gate fusion network (MGFNet). MGFNet consists of three modules: a multi-path feature extraction network, an MLP-gate fusion (MGF) module, and a decoder. Initially, MGFNet independently extracts features from high-resolution optical and SAR images while preserving spatial information. Then, the well-designed MGF module combines the multi-modal features through channel attention and gated fusion stages, utilizing MLP as a gate to exploit complementary information and filter redundant data. Additionally, we introduce a novel high-resolution multimodal remote sensing dataset, YESeg-OPT-SAR, with a spatial resolution of 0.5 m. To evaluate MGFNet, we compare it with several state-of-the-art (SOTA) models using YESeg-OPT-SAR and Pohang datasets, both of which are high-resolution multi-modal datasets. The experimental results demonstrate that MGFNet achieves higher evaluation metrics compared to other models, indicating its effectiveness in multi-modal feature fusion for segmentation. The source code and data are available at https://github.com/yeyuanxin110/YESeg-OPT-SAR.
ISSN:1569-8432