Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network
Semantic segmentation is a critical task in computer vision. Constructing complex semantic segmentation models with high accuracy, low spatial occupancy, and low computational complexity remains a challenge. To address this, this paper proposes a semantic segmentation network based on a hybrid archi...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | Machine Learning: Science and Technology |
| Subjects: | |
| Online Access: | https://doi.org/10.1088/2632-2153/add853 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849328427778179072 |
|---|---|
| author | Yane Li Zhichao Chen Hongxia Qi Ming Fan Lihua Li |
| author_facet | Yane Li Zhichao Chen Hongxia Qi Ming Fan Lihua Li |
| author_sort | Yane Li |
| collection | DOAJ |
| description | Semantic segmentation is a critical task in computer vision. Constructing complex semantic segmentation models with high accuracy, low spatial occupancy, and low computational complexity remains a challenge. To address this, this paper proposes a semantic segmentation network based on a hybrid architecture of convolutional neural network and Transformer, named shuffle window transformer DeeplabV3+ (SWT-DeepLabV3+). The network introduces a new module, called the SWT. When the window size is fixed, by integrating window attention (WA) and shuffle WA mechanisms, cross-window global context modeling with linear computational complexity is achieved. Additionally, we enhance the atrous spatial pyramid pooling (ASPP) by incorporating strip pooling to construct a strip ASPP, effectively extracting both regular and irregular multi-scale (MS) features. Simultaneously, the network adopts adaptive spatial feature fusion in the shallow layers. Dynamic adjustment of MS feature weights improves the backbone network’s ability to capture shallow discriminative features. Experimental results demonstrate that on three public datasets (PASCAL VOC 2012, Cityscapes, and CamVid), SWT-DeepLabV3+ exhibits outstanding segmentation performance under conditions of lower parameter count and computational cost, validating the model’s capability to achieve efficient processing while maintaining high accuracy. |
| format | Article |
| id | doaj-art-7d24bcded2254dbdba8ae7259add2c3e |
| institution | Kabale University |
| issn | 2632-2153 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IOP Publishing |
| record_format | Article |
| series | Machine Learning: Science and Technology |
| spelling | doaj-art-7d24bcded2254dbdba8ae7259add2c3e2025-08-20T03:47:36ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016202503910.1088/2632-2153/add853Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation networkYane Li0https://orcid.org/0000-0003-0065-7750Zhichao Chen1Hongxia Qi2Ming Fan3https://orcid.org/0000-0002-5626-7076Lihua Li4https://orcid.org/0000-0003-0435-6453College of Mathematics and Computer Science, Zhejiang A&F University , Hangzhou 311300, People’s Republic of ChinaCollege of Mathematics and Computer Science, Zhejiang A&F University , Hangzhou 311300, People’s Republic of ChinaDepartment of Echocardiography, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing, People’s Republic of ChinaInstitute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University , Hangzhou 310018, People’s Republic of ChinaInstitute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University , Hangzhou 310018, People’s Republic of ChinaSemantic segmentation is a critical task in computer vision. Constructing complex semantic segmentation models with high accuracy, low spatial occupancy, and low computational complexity remains a challenge. To address this, this paper proposes a semantic segmentation network based on a hybrid architecture of convolutional neural network and Transformer, named shuffle window transformer DeeplabV3+ (SWT-DeepLabV3+). The network introduces a new module, called the SWT. When the window size is fixed, by integrating window attention (WA) and shuffle WA mechanisms, cross-window global context modeling with linear computational complexity is achieved. Additionally, we enhance the atrous spatial pyramid pooling (ASPP) by incorporating strip pooling to construct a strip ASPP, effectively extracting both regular and irregular multi-scale (MS) features. Simultaneously, the network adopts adaptive spatial feature fusion in the shallow layers. Dynamic adjustment of MS feature weights improves the backbone network’s ability to capture shallow discriminative features. Experimental results demonstrate that on three public datasets (PASCAL VOC 2012, Cityscapes, and CamVid), SWT-DeepLabV3+ exhibits outstanding segmentation performance under conditions of lower parameter count and computational cost, validating the model’s capability to achieve efficient processing while maintaining high accuracy.https://doi.org/10.1088/2632-2153/add853semantic segmentationshuffle window transformerconvolutional neural networkDeepLabV3+ |
| spellingShingle | Yane Li Zhichao Chen Hongxia Qi Ming Fan Lihua Li Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network Machine Learning: Science and Technology semantic segmentation shuffle window transformer convolutional neural network DeepLabV3+ |
| title | Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network |
| title_full | Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network |
| title_fullStr | Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network |
| title_full_unstemmed | Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network |
| title_short | Shuffle window transformer DeepLabV3+: a lightweight convolutional neural network and transformer based hybrid semantic segmentation network |
| title_sort | shuffle window transformer deeplabv3 a lightweight convolutional neural network and transformer based hybrid semantic segmentation network |
| topic | semantic segmentation shuffle window transformer convolutional neural network DeepLabV3+ |
| url | https://doi.org/10.1088/2632-2153/add853 |
| work_keys_str_mv | AT yaneli shufflewindowtransformerdeeplabv3alightweightconvolutionalneuralnetworkandtransformerbasedhybridsemanticsegmentationnetwork AT zhichaochen shufflewindowtransformerdeeplabv3alightweightconvolutionalneuralnetworkandtransformerbasedhybridsemanticsegmentationnetwork AT hongxiaqi shufflewindowtransformerdeeplabv3alightweightconvolutionalneuralnetworkandtransformerbasedhybridsemanticsegmentationnetwork AT mingfan shufflewindowtransformerdeeplabv3alightweightconvolutionalneuralnetworkandtransformerbasedhybridsemanticsegmentationnetwork AT lihuali shufflewindowtransformerdeeplabv3alightweightconvolutionalneuralnetworkandtransformerbasedhybridsemanticsegmentationnetwork |