CAU<sup>2</sup>DNet: A Dual-Branch Deep Learning Network and a Dataset for Slum Recognition with Multi-Source Remote Sensing Data
The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) metho...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/14/2359 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face challenges such as limited receptive fields and insufficient sensitivity to spatial locations when integrating multi-source remote sensing data, and high-quality datasets that integrate multi-spectral and geoscientific indicators to support them are scarce. In response to these issues, this study proposes a DL model (coordinate-attentive U<sup>2</sup>-DeepLab network [CAU<sup>2</sup>DNet]) that integrates multi-source remote sensing data. The model integrates the multi-scale feature extraction capability of U<sup>2</sup>-Net with the global receptive field advantage of DeepLabV3+ through a dual-branch architecture. Thereafter, the spatial semantic perception capability is enhanced by introducing the CoordAttention mechanism, and ConvNextV2 is adopted to optimize the backbone network of the DeepLabV3+ branch, thereby improving the modeling capability of low-resolution geoscientific features. The two branches adopt a decision-level fusion mechanism for feature fusion, which means that the results of each are weighted and summed using learnable weights to obtain the final output feature map. Furthermore, this study constructs the São Paulo slums dataset for model training due to the lack of a multi-spectral slum dataset. This dataset covers 7978 samples of 512 × 512 pixels, integrating high-resolution RGB images, Normalized Difference Vegetation Index (NDVI)/Modified Normalized Difference Water Index (MNDWI) geoscientific indicators, and POI infrastructure data, which can significantly enrich multi-source slum remote sensing data. Experiments have shown that CAU<sup>2</sup>DNet achieves an intersection over union (IoU) of 0.6372 and an F1 score of 77.97% on the São Paulo slums dataset, indicating a significant improvement in accuracy over the baseline model. The ablation experiments verify that the improvements made in this study have resulted in a 16.12% increase in precision. Moreover, CAU<sup>2</sup>DNet also achieved the best results in all metrics during the cross-domain testing on the WHU building dataset, further confirming the model’s generalizability. |
|---|---|
| ISSN: | 2072-4292 |