DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images
Pneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnos...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11031440/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849689606083051520 |
|---|---|
| author | Michael Fu Chakkrit Tantithamthavorn Trung Le |
| author_facet | Michael Fu Chakkrit Tantithamthavorn Trung Le |
| author_sort | Michael Fu |
| collection | DOAJ |
| description | Pneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnosis through deep learning (DL) models. Despite progress, current methods face two key limitations: 1) reliance on CNNs that capture local but may overlook global features, and 2) the use of pre-trained models from natural image datasets like ImageNet, which lack the contextual relevance of medical imaging, leading to suboptimal performance. To address these challenges, we propose DAViT (Domain-Adapted Vision Transformer), a hybrid architecture that combines Vision Transformers (ViTs) and shallow CNNs with domain adaptation. The ViT leverages self-attention to capture global features, while the CNN extracts local ones. To mitigate domain differences, we adapt the model using a diverse chest X-ray dataset. We evaluate DAViT on a real-world dataset of 5,856 chest X-rays. The results demonstrate that DAViT achieves state-of-the-art performance with a 97% F1-score and 96% AUC for pneumonia detection, outperforming twelve baseline methods. For pneumonia type classification, DAViT achieves an 81% F1-score and 84% AUC, outperforming baselines by 25% to 74%. An ablation study highlights the critical contributions of domain adaptation, ViT, and CNN components, collectively enhancing performance by 21%. Finally, we apply Grad-CAM on top of DAViT to generate interpretable heatmaps that highlight relevant areas for bacterial and viral pneumonia cases, providing insights to assist medical practitioners in decision-making. These findings indicate the potential of DAViT to assist clinicians in pneumonia diagnosis through improved model accuracy and interpretability. The training code and pre-trained models are available at <uri>https://github.com/awsm-research/DAViT</uri> |
| format | Article |
| id | doaj-art-5209833db3e049f5bc9afcba7c0b8d65 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-5209833db3e049f5bc9afcba7c0b8d652025-08-20T03:21:34ZengIEEEIEEE Access2169-35362025-01-011310303310304410.1109/ACCESS.2025.357931411031440DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray ImagesMichael Fu0https://orcid.org/0000-0001-7211-3491Chakkrit Tantithamthavorn1https://orcid.org/0000-0002-5516-9984Trung Le2https://orcid.org/0000-0003-0414-9067School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, AustraliaFaculty of Information Technology, Monash University, Clayton, VIC, AustraliaFaculty of Information Technology, Monash University, Clayton, VIC, AustraliaPneumonia, a leading cause of mortality worldwide, especially among children under five, is typically diagnosed via chest X-rays. However, detecting it is challenging as expert radiologists must discern subtle patterns. Artificial intelligence (AI) offers a scalable alternative by automating diagnosis through deep learning (DL) models. Despite progress, current methods face two key limitations: 1) reliance on CNNs that capture local but may overlook global features, and 2) the use of pre-trained models from natural image datasets like ImageNet, which lack the contextual relevance of medical imaging, leading to suboptimal performance. To address these challenges, we propose DAViT (Domain-Adapted Vision Transformer), a hybrid architecture that combines Vision Transformers (ViTs) and shallow CNNs with domain adaptation. The ViT leverages self-attention to capture global features, while the CNN extracts local ones. To mitigate domain differences, we adapt the model using a diverse chest X-ray dataset. We evaluate DAViT on a real-world dataset of 5,856 chest X-rays. The results demonstrate that DAViT achieves state-of-the-art performance with a 97% F1-score and 96% AUC for pneumonia detection, outperforming twelve baseline methods. For pneumonia type classification, DAViT achieves an 81% F1-score and 84% AUC, outperforming baselines by 25% to 74%. An ablation study highlights the critical contributions of domain adaptation, ViT, and CNN components, collectively enhancing performance by 21%. Finally, we apply Grad-CAM on top of DAViT to generate interpretable heatmaps that highlight relevant areas for bacterial and viral pneumonia cases, providing insights to assist medical practitioners in decision-making. These findings indicate the potential of DAViT to assist clinicians in pneumonia diagnosis through improved model accuracy and interpretability. The training code and pre-trained models are available at <uri>https://github.com/awsm-research/DAViT</uri>https://ieeexplore.ieee.org/document/11031440/Domain adaptation in vision transformersdeep learning for chest X-ray imagesexplainable AI for X-ray imagespneumonia detectionpneumonia type classification |
| spellingShingle | Michael Fu Chakkrit Tantithamthavorn Trung Le DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images IEEE Access Domain adaptation in vision transformers deep learning for chest X-ray images explainable AI for X-ray images pneumonia detection pneumonia type classification |
| title | DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images |
| title_full | DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images |
| title_fullStr | DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images |
| title_full_unstemmed | DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images |
| title_short | DAViT: A Domain-Adapted Vision Transformer for Automated Pneumonia Detection and Explanation Using Chest X-Ray Images |
| title_sort | davit a domain adapted vision transformer for automated pneumonia detection and explanation using chest x ray images |
| topic | Domain adaptation in vision transformers deep learning for chest X-ray images explainable AI for X-ray images pneumonia detection pneumonia type classification |
| url | https://ieeexplore.ieee.org/document/11031440/ |
| work_keys_str_mv | AT michaelfu davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages AT chakkrittantithamthavorn davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages AT trungle davitadomainadaptedvisiontransformerforautomatedpneumoniadetectionandexplanationusingchestxrayimages |