Ground-Based Remote Sensing Cloud Image Segmentation Using Convolution-MLP Network
Recently, multilayer perceptron (MLPs)-based methods in computer vision have attracted much attention due to the ability of learning long-range dependencies. However, MLPs-based methods usually treat all the tokens equally, which is difficult to segment challenging cloud regions. In this article, we...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11098898/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Recently, multilayer perceptron (MLPs)-based methods in computer vision have attracted much attention due to the ability of learning long-range dependencies. However, MLPs-based methods usually treat all the tokens equally, which is difficult to segment challenging cloud regions. In this article, we propose a novel network named convolution-MLP network (Con-MLPNet) for ground-based remote sensing cloud image segmentation, which could effectively learn long-range dependencies via the combination of MLPs and the attention mechanism. To this end, we propose the attention-guided MLPs module to highlight salient features and suppress irrelevant features from the spatial and channel aspects. Meanwhile, different from existing MLPs methods where the long-range dependencies are learned from one single scale, we propose the dilated MLPs (DMLPs) to learn long-range dependencies at different scales by sampling different channels of tokens. Furthermore, we design the parallel dilated MLPs module to integrate multiple DMLPs with different parameters in order to extract multiscale information. We conduct a series of experiments on three public ground-based cloud image segmentation datasets, i.e., TLCDD, SWIMSEG, and TCDD, and the results demonstrate that the proposed Con-MLPNet achieves state-of-the-art performance. Specifically, on the TLCDD dataset, our method surpasses the competing method across all five evaluation metrics, with the improvements of 3.3% in precision, 2.48% in recall, 3.74% in F-score, 1.76% in accuracy, and 4.0% in IoU over the second-best results. |
|---|---|
| ISSN: | 1939-1404 2151-1535 |