An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
Monitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11059859/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Monitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from most existing image encoders of vision transformer pretrained with MAE method on natural images, which have difficulty in transferring to small-scale remote sensing dataset because of its large domain gap relative to natural images and limit the fine-tuning performance on downstream tasks, our method performs pretraining directly on target remote sensing imagery to better capture its feature distribution. Specifically, label information is filled into masked regions to enhance the model’s image reconstruction capabilities. In addition, we design a momentum branch to ensure the stability and consistency of feature updates and adopt the dynamic masking strategy to reduce reconstruction difficulty during the initial stages of pretraining. This ensures a smooth transition to later stages, significantly improving the accuracy and efficiency of image generation. Experimental results demonstrate that the proposed method significantly outperforms competing MAE-based vision transformer approaches as well as state-of-the-art CNN-based methods on the test dataset. Furthermore, an analysis of the per capita public space in Haikou city validates the effectiveness of the proposed method for monitoring urban public spaces. |
|---|---|
| ISSN: | 1939-1404 2151-1535 |