An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
Monitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11059859/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849318137496862720 |
|---|---|
| author | Wentao Wei Huan Chen Yu Jiang Li Fu Ping Yao |
| author_facet | Wentao Wei Huan Chen Yu Jiang Li Fu Ping Yao |
| author_sort | Wentao Wei |
| collection | DOAJ |
| description | Monitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from most existing image encoders of vision transformer pretrained with MAE method on natural images, which have difficulty in transferring to small-scale remote sensing dataset because of its large domain gap relative to natural images and limit the fine-tuning performance on downstream tasks, our method performs pretraining directly on target remote sensing imagery to better capture its feature distribution. Specifically, label information is filled into masked regions to enhance the model’s image reconstruction capabilities. In addition, we design a momentum branch to ensure the stability and consistency of feature updates and adopt the dynamic masking strategy to reduce reconstruction difficulty during the initial stages of pretraining. This ensures a smooth transition to later stages, significantly improving the accuracy and efficiency of image generation. Experimental results demonstrate that the proposed method significantly outperforms competing MAE-based vision transformer approaches as well as state-of-the-art CNN-based methods on the test dataset. Furthermore, an analysis of the per capita public space in Haikou city validates the effectiveness of the proposed method for monitoring urban public spaces. |
| format | Article |
| id | doaj-art-9aaacba9fd784868828ebdc124276c1f |
| institution | Kabale University |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-9aaacba9fd784868828ebdc124276c1f2025-08-20T03:50:59ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118169291694110.1109/JSTARS.2025.358454611059859An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing ImageryWentao Wei0https://orcid.org/0009-0008-3968-8929Huan Chen1https://orcid.org/0009-0008-8302-3887Yu Jiang2Li Fu3Ping Yao4https://orcid.org/0009-0008-7539-5066Institute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaMonitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from most existing image encoders of vision transformer pretrained with MAE method on natural images, which have difficulty in transferring to small-scale remote sensing dataset because of its large domain gap relative to natural images and limit the fine-tuning performance on downstream tasks, our method performs pretraining directly on target remote sensing imagery to better capture its feature distribution. Specifically, label information is filled into masked regions to enhance the model’s image reconstruction capabilities. In addition, we design a momentum branch to ensure the stability and consistency of feature updates and adopt the dynamic masking strategy to reduce reconstruction difficulty during the initial stages of pretraining. This ensures a smooth transition to later stages, significantly improving the accuracy and efficiency of image generation. Experimental results demonstrate that the proposed method significantly outperforms competing MAE-based vision transformer approaches as well as state-of-the-art CNN-based methods on the test dataset. Furthermore, an analysis of the per capita public space in Haikou city validates the effectiveness of the proposed method for monitoring urban public spaces.https://ieeexplore.ieee.org/document/11059859/Masked autoencoder (MAE)remote sensingsemantic segmentationurban public space |
| spellingShingle | Wentao Wei Huan Chen Yu Jiang Li Fu Ping Yao An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Masked autoencoder (MAE) remote sensing semantic segmentation urban public space |
| title | An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery |
| title_full | An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery |
| title_fullStr | An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery |
| title_full_unstemmed | An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery |
| title_short | An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery |
| title_sort | improved mae based pretraining method for urban public space monitoring with optical remote sensing imagery |
| topic | Masked autoencoder (MAE) remote sensing semantic segmentation urban public space |
| url | https://ieeexplore.ieee.org/document/11059859/ |
| work_keys_str_mv | AT wentaowei animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT huanchen animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT yujiang animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT lifu animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT pingyao animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT wentaowei improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT huanchen improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT yujiang improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT lifu improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery AT pingyao improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery |