MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transfor...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/15/4660 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation rates attention shuffle decoder (DDRASD), a multi-scale convolutional feature enhancement module (MCFEM), and a cross-path residual fusion module (CPRFM). The Swin Transformer efficiently extracts multi-level global semantic features through its hierarchical structure and window attention mechanism. The DDRASD’s diverse dilation rates attention (DDRA) block combines convolutions with diverse dilation rates and channel-coordinate attention to enhance multi-scale contextual awareness, while Shuffle Block improves resolution via pixel rearrangement and avoids checkerboard artifacts. The MCFEM enhances local feature modeling through parallel multi-kernel convolutions, forming a complementary relationship with the Swin Transformer’s global perception capability. The CPRFM employs multi-branch convolutions and a residual multiplication–addition fusion mechanism to enhance interactions among multi-source features, thereby improving the recognition of small objects and similar categories. Experiments on the ISPRS Vaihingen and Potsdam datasets show that MFPI-Net outperforms mainstream methods, achieving 82.57% and 88.49% mIoU, validating its superior segmentation performance in urban remote sensing. |
|---|---|
| ISSN: | 1424-8220 |