Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global con...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Open Journal of Signal Processing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11011931/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850220117129953280 |
|---|---|
| author | Rong-Xing Ding Yi-Han Xu Gang Yu Wen Zhou Ding Zhou |
| author_facet | Rong-Xing Ding Yi-Han Xu Gang Yu Wen Zhou Ding Zhou |
| author_sort | Rong-Xing Ding |
| collection | DOAJ |
| description | Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset. |
| format | Article |
| id | doaj-art-9ebbfb30ebcb4e91b631fdb01b6b631d |
| institution | OA Journals |
| issn | 2644-1322 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Open Journal of Signal Processing |
| spelling | doaj-art-9ebbfb30ebcb4e91b631fdb01b6b631d2025-08-20T02:07:10ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01660862010.1109/OJSP.2025.357320211011931Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing ImagesRong-Xing Ding0Yi-Han Xu1https://orcid.org/0000-0002-1986-2650Gang Yu2https://orcid.org/0000-0001-6413-4882Wen Zhou3https://orcid.org/0000-0003-4831-3375Ding Zhou4https://orcid.org/0000-0001-8288-2200College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing, ChinaCollege of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing, ChinaDepartment of Electronic and Electrical Engineering, The University of Sheffield, Sheffield, U.K.College of Low Altitude Equipment and Intelligent Control, Guangzhou Maritime University, Guangzhou, ChinaFaculty of Engineering, University of Malaya, Kuala Lumpur, MalaysiaSemantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.https://ieeexplore.ieee.org/document/11011931/Swin transformerdeep learningremote sensingsemantic segmentation |
| spellingShingle | Rong-Xing Ding Yi-Han Xu Gang Yu Wen Zhou Ding Zhou Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images IEEE Open Journal of Signal Processing Swin transformer deep learning remote sensing semantic segmentation |
| title | Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images |
| title_full | Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images |
| title_fullStr | Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images |
| title_full_unstemmed | Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images |
| title_short | Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images |
| title_sort | swin transformer with spatial and local context augmentation for enhanced semantic segmentation of remote sensing images |
| topic | Swin transformer deep learning remote sensing semantic segmentation |
| url | https://ieeexplore.ieee.org/document/11011931/ |
| work_keys_str_mv | AT rongxingding swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages AT yihanxu swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages AT gangyu swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages AT wenzhou swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages AT dingzhou swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages |