DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images

Pneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnos...

Full description

Saved in:

Bibliographic Details
Main Authors:	Michael Fu, Chakkrit Tantithamthavorn, Trung Le
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Domain adaptation in vision transformers deep learning for chest X-ray images explainable AI for X-ray images pneumonia detection pneumonia type classification
Online Access:	https://ieeexplore.ieee.org/document/11031440/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849689606083051520
author	Michael Fu Chakkrit Tantithamthavorn Trung Le
author_facet	Michael Fu Chakkrit Tantithamthavorn Trung Le
author_sort	Michael Fu
collection	DOAJ
description	Pneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnosis through deep learning (DL) models. Despite progress, current methods face two key limitations: 1) reliance on CNNs that capture local but may overlook global features, and 2) the use of pre-trained models from natural image datasets like ImageNet, which lack the contextual relevance of medical imaging, leading to suboptimal performance. To address these challenges, we propose DAViT (Domain-Adapted Vision Transformer), a hybrid architecture that combines Vision Transformers (ViTs) and shallow CNNs with domain adaptation. The ViT leverages self-attention to capture global features, while the CNN extracts local ones. To mitigate domain differences, we adapt the model using a diverse chest X-ray dataset. We evaluate DAViT on a real-world dataset of 5,856 chest X-rays. The results demonstrate that DAViT achieves state-of-the-art performance with a 97% F1-score and 96% AUC for pneumonia detection, outperforming twelve baseline methods. For pneumonia type classification, DAViT achieves an 81% F1-score and 84% AUC, outperforming baselines by 25% to 74%. An ablation study highlights the critical contributions of domain adaptation, ViT, and CNN components, collectively enhancing performance by 21%. Finally, we apply Grad-CAM on top of DAViT to generate interpretable heatmaps that highlight relevant areas for bacterial and viral pneumonia cases, providing insights to assist medical practitioners in decision-making. These findings indicate the potential of DAViT to assist clinicians in pneumonia diagnosis through improved model accuracy and interpretability. The training code and pre-trained models are available at <uri>https://github.com/awsm-research/DAViT</uri>
format	Article
id	doaj-art-5209833db3e049f5bc9afcba7c0b8d65
institution	DOAJ
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-5209833db3e049f5bc9afcba7c0b8d652025-08-20T03:21:34ZengIEEEIEEE Access2169-35362025-01-011310303310304410.1109/ACCESS.2025.357931411031440DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray ImagesMichael Fu0https://orcid.org/0000-0001-7211-3491Chakkrit Tantithamthavorn1https://orcid.org/0000-0002-5516-9984Trung Le2https://orcid.org/0000-0003-0414-9067School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, AustraliaFaculty of Information Technology, Monash University, Clayton, VIC, AustraliaFaculty of Information Technology, Monash University, Clayton, VIC, AustraliaPneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnosis through deep learning (DL) models. Despite progress, current methods face two key limitations: 1) reliance on CNNs that capture local but may overlook global features, and 2) the use of pre-trained models from natural image datasets like ImageNet, which lack the contextual relevance of medical imaging, leading to suboptimal performance. To address these challenges, we propose DAViT (Domain-Adapted Vision Transformer), a hybrid architecture that combines Vision Transformers (ViTs) and shallow CNNs with domain adaptation. The ViT leverages self-attention to capture global features, while the CNN extracts local ones. To mitigate domain differences, we adapt the model using a diverse chest X-ray dataset. We evaluate DAViT on a real-world dataset of 5,856 chest X-rays. The results demonstrate that DAViT achieves state-of-the-art performance with a 97% F1-score and 96% AUC for pneumonia detection, outperforming twelve baseline methods. For pneumonia type classification, DAViT achieves an 81% F1-score and 84% AUC, outperforming baselines by 25% to 74%. An ablation study highlights the critical contributions of domain adaptation, ViT, and CNN components, collectively enhancing performance by 21%. Finally, we apply Grad-CAM on top of DAViT to generate interpretable heatmaps that highlight relevant areas for bacterial and viral pneumonia cases, providing insights to assist medical practitioners in decision-making. These findings indicate the potential of DAViT to assist clinicians in pneumonia diagnosis through improved model accuracy and interpretability. The training code and pre-trained models are available at <uri>https://github.com/awsm-research/DAViT</uri>https://ieeexplore.ieee.org/document/11031440/Domain adaptation in vision transformersdeep learning for chest X-ray imagesexplainable AI for X-ray imagespneumonia detectionpneumonia type classification
spellingShingle	Michael Fu Chakkrit Tantithamthavorn Trung Le DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images IEEE Access Domain adaptation in vision transformers deep learning for chest X-ray images explainable AI for X-ray images pneumonia detection pneumonia type classification
title	DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_full	DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_fullStr	DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_full_unstemmed	DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_short	DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_sort	davit a domain adapted vision transformer for automated pneumonia detection and explanation using chest x ray images
topic	Domain adaptation in vision transformers deep learning for chest X-ray images explainable AI for X-ray images pneumonia detection pneumonia type classification
url	https://ieeexplore.ieee.org/document/11031440/
work_keys_str_mv	AT michaelfu davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages AT chakkrittantithamthavorn davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages AT trungle davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages

DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images

Similar Items