GLFFNet: Global–Local Feature Fusion Network for High-Resolution Remote Sensing Image Semantic Segmentation
Although hybrid models based on convolutional neural network (CNN) and Transformer can extract features encompassing both global and local information, they still face two challenges in addressing the semantic segmentation task of high-resolution remote sensing (HR<sup>2</sup>S) images....
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/6/1019 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Although hybrid models based on convolutional neural network (CNN) and Transformer can extract features encompassing both global and local information, they still face two challenges in addressing the semantic segmentation task of high-resolution remote sensing (HR<sup>2</sup>S) images. First, they are limited by the loss of detailed information during encoding, resulting in inadequate utilization of features. Second, the ineffective fusion of local and global context information leads to unsatisfactory segmentation performance. To simultaneously address these two challenges, we propose a dual-branch network named global–local feature fusion network (GLFFNet) for HR<sup>2</sup>S image semantic segmentation. Specifically, we use the residual network (ResNet) as the main branch to extract local features. Recently, a Mamba architecture based on State Space Models has shown significant potential in image semantic segmentation tasks. Given that Mamba is capable of handling long-range relationships with linear computational complexity and relatively high speed, we introduce VMamba as an auxiliary branch encoder to provide global information for the main branch. Meanwhile, in order to utilize global information efficiently, we propose a multi-scale feature refinement (MSFR) module to reduce the loss of details during global feature extraction. Additionally, we develop a semantic bridging fusion (SBF) module to promote the full integration of global and local features, resulting in more comprehensive and refined feature representations. Comparative experiments on three public datasets demonstrate the segmentation accuracy and application potential of GLFFNet. Specifically, GLFFNet achieves mIoU scores of 84.01% on ISPRS Vaihingen, 87.54% on ISPRS Potsdam, and 54.73% on LoveDA, as well as mF1 scores of 91.11%, 93.23%, and 70.07% on these respective datasets. |
|---|---|
| ISSN: | 2072-4292 |