An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery

Monitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from...

Full description

Saved in:
Bibliographic Details
Main Authors: Wentao Wei, Huan Chen, Yu Jiang, Li Fu, Ping Yao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11059859/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849318137496862720
author Wentao Wei
Huan Chen
Yu Jiang
Li Fu
Ping Yao
author_facet Wentao Wei
Huan Chen
Yu Jiang
Li Fu
Ping Yao
author_sort Wentao Wei
collection DOAJ
description Monitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from most existing image encoders of vision transformer pretrained with MAE method on natural images, which have difficulty in transferring to small-scale remote sensing dataset because of its large domain gap relative to natural images and limit the fine-tuning performance on downstream tasks, our method performs pretraining directly on target remote sensing imagery to better capture its feature distribution. Specifically, label information is filled into masked regions to enhance the model’s image reconstruction capabilities. In addition, we design a momentum branch to ensure the stability and consistency of feature updates and adopt the dynamic masking strategy to reduce reconstruction difficulty during the initial stages of pretraining. This ensures a smooth transition to later stages, significantly improving the accuracy and efficiency of image generation. Experimental results demonstrate that the proposed method significantly outperforms competing MAE-based vision transformer approaches as well as state-of-the-art CNN-based methods on the test dataset. Furthermore, an analysis of the per capita public space in Haikou city validates the effectiveness of the proposed method for monitoring urban public spaces.
format Article
id doaj-art-9aaacba9fd784868828ebdc124276c1f
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-9aaacba9fd784868828ebdc124276c1f2025-08-20T03:50:59ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118169291694110.1109/JSTARS.2025.358454611059859An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing ImageryWentao Wei0https://orcid.org/0009-0008-3968-8929Huan Chen1https://orcid.org/0009-0008-8302-3887Yu Jiang2Li Fu3Ping Yao4https://orcid.org/0009-0008-7539-5066Institute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaInstitute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaMonitoring urban public spaces is a vital component of scientific urban planning. This article proposes an improved masked autoencoder (MAE)-based pretraining method for automatic monitoring of urban public spaces with semantic segmentation of optical satellite remote sensing imagery. Different from most existing image encoders of vision transformer pretrained with MAE method on natural images, which have difficulty in transferring to small-scale remote sensing dataset because of its large domain gap relative to natural images and limit the fine-tuning performance on downstream tasks, our method performs pretraining directly on target remote sensing imagery to better capture its feature distribution. Specifically, label information is filled into masked regions to enhance the model’s image reconstruction capabilities. In addition, we design a momentum branch to ensure the stability and consistency of feature updates and adopt the dynamic masking strategy to reduce reconstruction difficulty during the initial stages of pretraining. This ensures a smooth transition to later stages, significantly improving the accuracy and efficiency of image generation. Experimental results demonstrate that the proposed method significantly outperforms competing MAE-based vision transformer approaches as well as state-of-the-art CNN-based methods on the test dataset. Furthermore, an analysis of the per capita public space in Haikou city validates the effectiveness of the proposed method for monitoring urban public spaces.https://ieeexplore.ieee.org/document/11059859/Masked autoencoder (MAE)remote sensingsemantic segmentationurban public space
spellingShingle Wentao Wei
Huan Chen
Yu Jiang
Li Fu
Ping Yao
An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Masked autoencoder (MAE)
remote sensing
semantic segmentation
urban public space
title An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
title_full An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
title_fullStr An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
title_full_unstemmed An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
title_short An Improved MAE-Based Pretraining Method for Urban Public Space Monitoring With Optical Remote Sensing Imagery
title_sort improved mae based pretraining method for urban public space monitoring with optical remote sensing imagery
topic Masked autoencoder (MAE)
remote sensing
semantic segmentation
urban public space
url https://ieeexplore.ieee.org/document/11059859/
work_keys_str_mv AT wentaowei animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT huanchen animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT yujiang animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT lifu animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT pingyao animprovedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT wentaowei improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT huanchen improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT yujiang improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT lifu improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery
AT pingyao improvedmaebasedpretrainingmethodforurbanpublicspacemonitoringwithopticalremotesensingimagery