Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images

Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global con...

Full description

Saved in:
Bibliographic Details
Main Authors: Rong-Xing Ding, Yi-Han Xu, Gang Yu, Wen Zhou, Ding Zhou
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11011931/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850220117129953280
author Rong-Xing Ding
Yi-Han Xu
Gang Yu
Wen Zhou
Ding Zhou
author_facet Rong-Xing Ding
Yi-Han Xu
Gang Yu
Wen Zhou
Ding Zhou
author_sort Rong-Xing Ding
collection DOAJ
description Semantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.
format Article
id doaj-art-9ebbfb30ebcb4e91b631fdb01b6b631d
institution OA Journals
issn 2644-1322
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Signal Processing
spelling doaj-art-9ebbfb30ebcb4e91b631fdb01b6b631d2025-08-20T02:07:10ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01660862010.1109/OJSP.2025.357320211011931Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing ImagesRong-Xing Ding0Yi-Han Xu1https://orcid.org/0000-0002-1986-2650Gang Yu2https://orcid.org/0000-0001-6413-4882Wen Zhou3https://orcid.org/0000-0003-4831-3375Ding Zhou4https://orcid.org/0000-0001-8288-2200College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing, ChinaCollege of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing, ChinaDepartment of Electronic and Electrical Engineering, The University of Sheffield, Sheffield, U.K.College of Low Altitude Equipment and Intelligent Control, Guangzhou Maritime University, Guangzhou, ChinaFaculty of Engineering, University of Malaya, Kuala Lumpur, MalaysiaSemantic segmentation of remote sensing images is extensively used in crop cover and type analysis, and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modelling capability of Swin Transformer, we propose the LSENet network, which follows the encoder-decoder architecture of the UNet network. In encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset, 72.59% on the Vaihingen dataset and 64.49% on the OpenEarthMap dataset.https://ieeexplore.ieee.org/document/11011931/Swin transformerdeep learningremote sensingsemantic segmentation
spellingShingle Rong-Xing Ding
Yi-Han Xu
Gang Yu
Wen Zhou
Ding Zhou
Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
IEEE Open Journal of Signal Processing
Swin transformer
deep learning
remote sensing
semantic segmentation
title Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
title_full Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
title_fullStr Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
title_full_unstemmed Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
title_short Swin Transformer With Spatial and Local Context Augmentation for Enhanced Semantic Segmentation of Remote Sensing Images
title_sort swin transformer with spatial and local context augmentation for enhanced semantic segmentation of remote sensing images
topic Swin transformer
deep learning
remote sensing
semantic segmentation
url https://ieeexplore.ieee.org/document/11011931/
work_keys_str_mv AT rongxingding swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages
AT yihanxu swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages
AT gangyu swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages
AT wenzhou swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages
AT dingzhou swintransformerwithspatialandlocalcontextaugmentationforenhancedsemanticsegmentationofremotesensingimages