DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images

Pneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnos...

Full description

Saved in:
Bibliographic Details
Main Authors: Michael Fu, Chakkrit Tantithamthavorn, Trung Le
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11031440/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849689606083051520
author Michael Fu
Chakkrit Tantithamthavorn
Trung Le
author_facet Michael Fu
Chakkrit Tantithamthavorn
Trung Le
author_sort Michael Fu
collection DOAJ
description Pneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnosis through deep learning (DL) models. Despite progress, current methods face two key limitations: 1) reliance on CNNs that capture local but may overlook global features, and 2) the use of pre-trained models from natural image datasets like ImageNet, which lack the contextual relevance of medical imaging, leading to suboptimal performance. To address these challenges, we propose DAViT (Domain-Adapted Vision Transformer), a hybrid architecture that combines Vision Transformers (ViTs) and shallow CNNs with domain adaptation. The ViT leverages self-attention to capture global features, while the CNN extracts local ones. To mitigate domain differences, we adapt the model using a diverse chest X-ray dataset. We evaluate DAViT on a real-world dataset of 5,856 chest X-rays. The results demonstrate that DAViT achieves state-of-the-art performance with a 97% F1-score and 96% AUC for pneumonia detection, outperforming twelve baseline methods. For pneumonia type classification, DAViT achieves an 81% F1-score and 84% AUC, outperforming baselines by 25% to 74%. An ablation study highlights the critical contributions of domain adaptation, ViT, and CNN components, collectively enhancing performance by 21%. Finally, we apply Grad-CAM on top of DAViT to generate interpretable heatmaps that highlight relevant areas for bacterial and viral pneumonia cases, providing insights to assist medical practitioners in decision-making. These findings indicate the potential of DAViT to assist clinicians in pneumonia diagnosis through improved model accuracy and interpretability. The training code and pre-trained models are available at <uri>https://github.com/awsm-research/DAViT</uri>
format Article
id doaj-art-5209833db3e049f5bc9afcba7c0b8d65
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5209833db3e049f5bc9afcba7c0b8d652025-08-20T03:21:34ZengIEEEIEEE Access2169-35362025-01-011310303310304410.1109/ACCESS.2025.357931411031440DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray ImagesMichael Fu0https://orcid.org/0000-0001-7211-3491Chakkrit Tantithamthavorn1https://orcid.org/0000-0002-5516-9984Trung Le2https://orcid.org/0000-0003-0414-9067School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, AustraliaFaculty of Information Technology, Monash University, Clayton, VIC, AustraliaFaculty of Information Technology, Monash University, Clayton, VIC, AustraliaPneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnosis through deep learning (DL) models. Despite progress, current methods face two key limitations: 1) reliance on CNNs that capture local but may overlook global features, and 2) the use of pre-trained models from natural image datasets like ImageNet, which lack the contextual relevance of medical imaging, leading to suboptimal performance. To address these challenges, we propose DAViT (Domain-Adapted Vision Transformer), a hybrid architecture that combines Vision Transformers (ViTs) and shallow CNNs with domain adaptation. The ViT leverages self-attention to capture global features, while the CNN extracts local ones. To mitigate domain differences, we adapt the model using a diverse chest X-ray dataset. We evaluate DAViT on a real-world dataset of 5,856 chest X-rays. The results demonstrate that DAViT achieves state-of-the-art performance with a 97% F1-score and 96% AUC for pneumonia detection, outperforming twelve baseline methods. For pneumonia type classification, DAViT achieves an 81% F1-score and 84% AUC, outperforming baselines by 25% to 74%. An ablation study highlights the critical contributions of domain adaptation, ViT, and CNN components, collectively enhancing performance by 21%. Finally, we apply Grad-CAM on top of DAViT to generate interpretable heatmaps that highlight relevant areas for bacterial and viral pneumonia cases, providing insights to assist medical practitioners in decision-making. These findings indicate the potential of DAViT to assist clinicians in pneumonia diagnosis through improved model accuracy and interpretability. The training code and pre-trained models are available at <uri>https://github.com/awsm-research/DAViT</uri>https://ieeexplore.ieee.org/document/11031440/Domain adaptation in vision transformersdeep learning for chest X-ray imagesexplainable AI for X-ray imagespneumonia detectionpneumonia type classification
spellingShingle Michael Fu
Chakkrit Tantithamthavorn
Trung Le
DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
IEEE Access
Domain adaptation in vision transformers
deep learning for chest X-ray images
explainable AI for X-ray images
pneumonia detection
pneumonia type classification
title DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_full DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_fullStr DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_full_unstemmed DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_short DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
title_sort davit a domain adapted vision transformer for automated pneumonia detection and explanation using chest x ray images
topic Domain adaptation in vision transformers
deep learning for chest X-ray images
explainable AI for X-ray images
pneumonia detection
pneumonia type classification
url https://ieeexplore.ieee.org/document/11031440/
work_keys_str_mv AT michaelfu davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages
AT chakkrittantithamthavorn davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages
AT trungle davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages