Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification

Lung cancer is one of the deadliest cancers. Early diagnosis of lung cancer can increase the 5-year survival rate to 70%, and the lung nodule classification is the basis for early diagnosis. However, due to the small scales and variations in shape and texture of lung nodules, accurate classification...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaozhong Xue, Yanhe Ma, Weiwei Du, Yahui Peng
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10839367/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823857111727079424
author Xiaozhong Xue
Yanhe Ma
Weiwei Du
Yahui Peng
author_facet Xiaozhong Xue
Yanhe Ma
Weiwei Du
Yahui Peng
author_sort Xiaozhong Xue
collection DOAJ
description Lung cancer is one of the deadliest cancers. Early diagnosis of lung cancer can increase the 5-year survival rate to 70%, and the lung nodule classification is the basis for early diagnosis. However, due to the small scales and variations in shape and texture of lung nodules, accurate classification of lung nodules is a challenging task. Deep learning models have been widely used in lung nodule classification. But due to the lack of interpretability for these models, it has not been widely applied in clinical. Therefore, a model that can be interpretable and focus on key features such as shape and texture is necessary. Moreover, comparing the performance of different models is challenging due to the different training and testing sets used. This study combines 2 attention mechanisms and proposes a novel deep learning model called squeeze-and-excitation vision transformer (SE-ViT). Self-attention in SE-ViT can extract the features of relationship between different patches, which can be used to determine the heterogeneity and shape of lung nodules. The SE mechanism assigns higher weights to patches containing crucial information. Finally, attention maps generated by Grad-CAM demonstrate that 2 attention mechanisms enable the model to focus on key areas for classifying benign and malignant lung nodules. In this study, the SE-ViT achieves an average accuracy of 0.863, precision of 0.872, recall of 0.876, F1-score of 0.872 and AUC of 0.862 on public dataset LUNA16. This result shows a significant improvement compared to ViT and SE-CNN.
format Article
id doaj-art-124d51abb7b241fc911325af03d5ed0a
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-124d51abb7b241fc911325af03d5ed0a2025-02-12T00:02:36ZengIEEEIEEE Access2169-35362025-01-0113248522486610.1109/ACCESS.2025.352912710839367Squeeze-and-Excitation Vision Transformer for Lung Nodule ClassificationXiaozhong Xue0https://orcid.org/0000-0001-6302-5318Yanhe Ma1Weiwei Du2https://orcid.org/0000-0002-5133-5615Yahui Peng3Kyoto Institute of Technology, Kyoto, JapanMedical Imaging Department, Tianjin Chest Hospital, Tianjin, ChinaKyoto Institute of Technology, Kyoto, JapanBeijing Jiaotong University, Beijing, ChinaLung cancer is one of the deadliest cancers. Early diagnosis of lung cancer can increase the 5-year survival rate to 70%, and the lung nodule classification is the basis for early diagnosis. However, due to the small scales and variations in shape and texture of lung nodules, accurate classification of lung nodules is a challenging task. Deep learning models have been widely used in lung nodule classification. But due to the lack of interpretability for these models, it has not been widely applied in clinical. Therefore, a model that can be interpretable and focus on key features such as shape and texture is necessary. Moreover, comparing the performance of different models is challenging due to the different training and testing sets used. This study combines 2 attention mechanisms and proposes a novel deep learning model called squeeze-and-excitation vision transformer (SE-ViT). Self-attention in SE-ViT can extract the features of relationship between different patches, which can be used to determine the heterogeneity and shape of lung nodules. The SE mechanism assigns higher weights to patches containing crucial information. Finally, attention maps generated by Grad-CAM demonstrate that 2 attention mechanisms enable the model to focus on key areas for classifying benign and malignant lung nodules. In this study, the SE-ViT achieves an average accuracy of 0.863, precision of 0.872, recall of 0.876, F1-score of 0.872 and AUC of 0.862 on public dataset LUNA16. This result shows a significant improvement compared to ViT and SE-CNN.https://ieeexplore.ieee.org/document/10839367/SE-ViTlung nodule classificationinterpretabilityself-attentionsqueeze-and-excitation
spellingShingle Xiaozhong Xue
Yanhe Ma
Weiwei Du
Yahui Peng
Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
IEEE Access
SE-ViT
lung nodule classification
interpretability
self-attention
squeeze-and-excitation
title Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
title_full Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
title_fullStr Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
title_full_unstemmed Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
title_short Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
title_sort squeeze and excitation vision transformer for lung nodule classification
topic SE-ViT
lung nodule classification
interpretability
self-attention
squeeze-and-excitation
url https://ieeexplore.ieee.org/document/10839367/
work_keys_str_mv AT xiaozhongxue squeezeandexcitationvisiontransformerforlungnoduleclassification
AT yanhema squeezeandexcitationvisiontransformerforlungnoduleclassification
AT weiweidu squeezeandexcitationvisiontransformerforlungnoduleclassification
AT yahuipeng squeezeandexcitationvisiontransformerforlungnoduleclassification