FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images

Semantic segmentation of urban remote sensing images is a very challenging task. Due to the complex background, occlusion overlap and small scale target of urban remote sensing image, the semantic segmentation results have some defects such as target confusion and similarity, target boundary ambigui...

Full description

Saved in:
Bibliographic Details
Main Authors: Tianren Wu, Wenqin Deng, Rui Lin, Junzhe Jiang, Xueyun Chen
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10966862/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849712572193832960
author Tianren Wu
Wenqin Deng
Rui Lin
Junzhe Jiang
Xueyun Chen
author_facet Tianren Wu
Wenqin Deng
Rui Lin
Junzhe Jiang
Xueyun Chen
author_sort Tianren Wu
collection DOAJ
description Semantic segmentation of urban remote sensing images is a very challenging task. Due to the complex background, occlusion overlap and small scale target of urban remote sensing image, the semantic segmentation results have some defects such as target confusion and similarity, target boundary ambiguity, and small scale target omission. To solve the above problems, a feature-interactive fusion and multi-scale detail sensing lightweight enhanced Swin Transformer (FML-Swin) is proposed. The model includes several key components: feature interactive fusion transformer (FIFT) module, which enhances the model’s focus on current channel features; multi-scale detail sensing (MSDS) module, specifically designed to capture small scale features and details in remote sensing images; and lightweight enhanced squeeze excitation (LESE) module, which enriches the semantic feature information contained in the input image while maintaining a lightweight design. With limited training rounds, the model achieves a mIoU accuracy of 78.58 on the multi-class semantic segmentation task of the Potsdam dataset, exceeding SegNeXt 0.49. In addition, on the multi-class semantic segmentation task of the Vaihingen dataset, the mIoU accuracy of the model is 74.75, which is higher than SegNeXt 0.17. These results demonstrate the validity of the model.
format Article
id doaj-art-9e0e723dd5ea4d13882df2d44b540f59
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-9e0e723dd5ea4d13882df2d44b540f592025-08-20T03:14:13ZengIEEEIEEE Access2169-35362025-01-0113669316694310.1109/ACCESS.2025.356132510966862FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing ImagesTianren Wu0https://orcid.org/0009-0000-9718-8289Wenqin Deng1https://orcid.org/0009-0006-6643-2182Rui Lin2https://orcid.org/0009-0002-3452-9757Junzhe Jiang3https://orcid.org/0009-0004-4686-7520Xueyun Chen4https://orcid.org/0000-0002-7452-0223School of Electrical Engineering, Guangxi University, Nanning, ChinaSchool of Electrical Engineering, Guangxi University, Nanning, ChinaSchool of Electrical Engineering, Guangxi University, Nanning, ChinaSchool of Electrical Engineering, Guangxi University, Nanning, ChinaSchool of Electrical Engineering, Guangxi University, Nanning, ChinaSemantic segmentation of urban remote sensing images is a very challenging task. Due to the complex background, occlusion overlap and small scale target of urban remote sensing image, the semantic segmentation results have some defects such as target confusion and similarity, target boundary ambiguity, and small scale target omission. To solve the above problems, a feature-interactive fusion and multi-scale detail sensing lightweight enhanced Swin Transformer (FML-Swin) is proposed. The model includes several key components: feature interactive fusion transformer (FIFT) module, which enhances the model’s focus on current channel features; multi-scale detail sensing (MSDS) module, specifically designed to capture small scale features and details in remote sensing images; and lightweight enhanced squeeze excitation (LESE) module, which enriches the semantic feature information contained in the input image while maintaining a lightweight design. With limited training rounds, the model achieves a mIoU accuracy of 78.58 on the multi-class semantic segmentation task of the Potsdam dataset, exceeding SegNeXt 0.49. In addition, on the multi-class semantic segmentation task of the Vaihingen dataset, the mIoU accuracy of the model is 74.75, which is higher than SegNeXt 0.17. These results demonstrate the validity of the model.https://ieeexplore.ieee.org/document/10966862/Swin transformerremote sensing imagessemantic segmentationfeature interactive fusionmulti-scale detail sensing
spellingShingle Tianren Wu
Wenqin Deng
Rui Lin
Junzhe Jiang
Xueyun Chen
FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images
IEEE Access
Swin transformer
remote sensing images
semantic segmentation
feature interactive fusion
multi-scale detail sensing
title FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images
title_full FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images
title_fullStr FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images
title_full_unstemmed FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images
title_short FML-Swin: An Improved Swin Transformer Segmentor for Remote Sensing Images
title_sort fml swin an improved swin transformer segmentor for remote sensing images
topic Swin transformer
remote sensing images
semantic segmentation
feature interactive fusion
multi-scale detail sensing
url https://ieeexplore.ieee.org/document/10966862/
work_keys_str_mv AT tianrenwu fmlswinanimprovedswintransformersegmentorforremotesensingimages
AT wenqindeng fmlswinanimprovedswintransformersegmentorforremotesensingimages
AT ruilin fmlswinanimprovedswintransformersegmentorforremotesensingimages
AT junzhejiang fmlswinanimprovedswintransformersegmentorforremotesensingimages
AT xueyunchen fmlswinanimprovedswintransformersegmentorforremotesensingimages