WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules

Abstract Accurate differential diagnosis of pneumonia remains a challenging task, as different types of pneumonia require distinct treatment strategies. Early and precise diagnosis is crucial for minimizing the risk of misdiagnosis and for effectively guiding clinical decision-making and monitoring...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yu Gu, Haotian Bai, Meng Chen, Lidong Yang, Baohua Zhang, Jing Wang, Xiaoqi Lu, Jianjun Li, Xin Liu, Dahua Yu, Ying Zhao, Siyuan Tang, Qun He
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Deep learning Pneumonia Medical image classification Vision transformer network Window interaction Convolution-Based module
Online Access:	https://doi.org/10.1038/s41598-025-12117-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849766188703285248
author	Yu Gu Haotian Bai Meng Chen Lidong Yang Baohua Zhang Jing Wang Xiaoqi Lu Jianjun Li Xin Liu Dahua Yu Ying Zhao Siyuan Tang Qun He
author_facet	Yu Gu Haotian Bai Meng Chen Lidong Yang Baohua Zhang Jing Wang Xiaoqi Lu Jianjun Li Xin Liu Dahua Yu Ying Zhao Siyuan Tang Qun He
author_sort	Yu Gu
collection	DOAJ
description	Abstract Accurate differential diagnosis of pneumonia remains a challenging task, as different types of pneumonia require distinct treatment strategies. Early and precise diagnosis is crucial for minimizing the risk of misdiagnosis and for effectively guiding clinical decision-making and monitoring treatment response. This study proposes the WSDC-ViT network to enhance computer-aided pneumonia detection and alleviate the diagnostic workload for radiologists. Unlike existing models such as Swin Transformer or CoAtNet, which primarily improve attention mechanisms through hierarchical designs or convolutional embedding, WSDC-ViT introduces a novel architecture that simultaneously enhances global and local feature extraction through a scalable self-attention mechanism and convolutional refinement. Specifically, the network integrates a scalable self-attention mechanism that decouples the query, key, and value dimensions to reduce computational overhead and improve contextual learning, while an interactive window-based attention module further strengthens long-range dependency modeling. Additionally, a convolution-based module equipped with a dynamic ReLU activation function is embedded within the transformer encoder to capture fine-grained local details and adaptively enhance feature expression. Experimental results demonstrate that the proposed method achieves an average classification accuracy of 95.13% and an F1-score of 95.63% on a chest X-ray dataset, along with 99.36% accuracy and a 99.34% F1-score on a CT dataset. These results highlight the model’s superior performance compared to existing automated pneumonia classification approaches, underscoring its potential clinical applicability.
format	Article
id	doaj-art-87e9115a91db4084aa0f047b6294d993
institution	DOAJ
issn	2045-2322
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-87e9115a91db4084aa0f047b6294d9932025-08-20T03:04:39ZengNature PortfolioScientific Reports2045-23222025-07-0115112110.1038/s41598-025-12117-0WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modulesYu Gu0Haotian Bai1Meng Chen2Lidong Yang3Baohua Zhang4Jing Wang5Xiaoqi Lu6Jianjun Li7Xin Liu8Dahua Yu9Ying Zhao10Siyuan Tang11Qun He12School of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Automation and Electrical Engineering, Inner Mongolia University of Science and TechnologySchool of Information and Electronics, Beijing Institute of TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Automation and Electrical Engineering, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologySchool of Digital and Intelligent Industry, Inner Mongolia University of Science and TechnologyAbstract Accurate differential diagnosis of pneumonia remains a challenging task, as different types of pneumonia require distinct treatment strategies. Early and precise diagnosis is crucial for minimizing the risk of misdiagnosis and for effectively guiding clinical decision-making and monitoring treatment response. This study proposes the WSDC-ViT network to enhance computer-aided pneumonia detection and alleviate the diagnostic workload for radiologists. Unlike existing models such as Swin Transformer or CoAtNet, which primarily improve attention mechanisms through hierarchical designs or convolutional embedding, WSDC-ViT introduces a novel architecture that simultaneously enhances global and local feature extraction through a scalable self-attention mechanism and convolutional refinement. Specifically, the network integrates a scalable self-attention mechanism that decouples the query, key, and value dimensions to reduce computational overhead and improve contextual learning, while an interactive window-based attention module further strengthens long-range dependency modeling. Additionally, a convolution-based module equipped with a dynamic ReLU activation function is embedded within the transformer encoder to capture fine-grained local details and adaptively enhance feature expression. Experimental results demonstrate that the proposed method achieves an average classification accuracy of 95.13% and an F1-score of 95.63% on a chest X-ray dataset, along with 99.36% accuracy and a 99.34% F1-score on a CT dataset. These results highlight the model’s superior performance compared to existing automated pneumonia classification approaches, underscoring its potential clinical applicability.https://doi.org/10.1038/s41598-025-12117-0Deep learningPneumoniaMedical image classificationVision transformer networkWindow interactionConvolution-Based module
spellingShingle	Yu Gu Haotian Bai Meng Chen Lidong Yang Baohua Zhang Jing Wang Xiaoqi Lu Jianjun Li Xin Liu Dahua Yu Ying Zhao Siyuan Tang Qun He WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules Scientific Reports Deep learning Pneumonia Medical image classification Vision transformer network Window interaction Convolution-Based module
title	WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
title_full	WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
title_fullStr	WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
title_full_unstemmed	WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
title_short	WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
title_sort	wsdc vit a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
topic	Deep learning Pneumonia Medical image classification Vision transformer network Window interaction Convolution-Based module
url	https://doi.org/10.1038/s41598-025-12117-0
work_keys_str_mv	AT yugu wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT haotianbai wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT mengchen wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT lidongyang wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT baohuazhang wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT jingwang wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT xiaoqilu wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT jianjunli wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT xinliu wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT dahuayu wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT yingzhao wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT siyuantang wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules AT qunhe wsdcvitanoveltransformernetworkforpneumoniaimageclassificationbasedonwindowsscalableattentionanddynamicrectifiedlinearunitconvolutionalmodules

WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules

Similar Items