Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
Lung cancer is one of the deadliest cancers. Early diagnosis of lung cancer can increase the 5-year survival rate to 70%, and the lung nodule classification is the basis for early diagnosis. However, due to the small scales and variations in shape and texture of lung nodules, accurate classification...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10839367/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Lung cancer is one of the deadliest cancers. Early diagnosis of lung cancer can increase the 5-year survival rate to 70%, and the lung nodule classification is the basis for early diagnosis. However, due to the small scales and variations in shape and texture of lung nodules, accurate classification of lung nodules is a challenging task. Deep learning models have been widely used in lung nodule classification. But due to the lack of interpretability for these models, it has not been widely applied in clinical. Therefore, a model that can be interpretable and focus on key features such as shape and texture is necessary. Moreover, comparing the performance of different models is challenging due to the different training and testing sets used. This study combines 2 attention mechanisms and proposes a novel deep learning model called squeeze-and-excitation vision transformer (SE-ViT). Self-attention in SE-ViT can extract the features of relationship between different patches, which can be used to determine the heterogeneity and shape of lung nodules. The SE mechanism assigns higher weights to patches containing crucial information. Finally, attention maps generated by Grad-CAM demonstrate that 2 attention mechanisms enable the model to focus on key areas for classifying benign and malignant lung nodules. In this study, the SE-ViT achieves an average accuracy of 0.863, precision of 0.872, recall of 0.876, F1-score of 0.872 and AUC of 0.862 on public dataset LUNA16. This result shows a significant improvement compared to ViT and SE-CNN. |
---|---|
ISSN: | 2169-3536 |