Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study

Lung cancer stands as the most prevalent and deadliest type of cancer, with adenocarcinoma being the most common subtype. Computed Tomography (CT) is widely used for detecting tumours and their phenotype characteristics, for an early and accurate diagnosis that impacts patient outcomes. Machine lear...

Full description

Saved in:
Bibliographic Details
Main Authors: Margarida Gouveia, Tânia Mendes, Eduardo M. Rodrigues, Hélder P. Oliveira, Tania Pereira
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/3/1148
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850068365101498368
author Margarida Gouveia
Tânia Mendes
Eduardo M. Rodrigues
Hélder P. Oliveira
Tania Pereira
author_facet Margarida Gouveia
Tânia Mendes
Eduardo M. Rodrigues
Hélder P. Oliveira
Tania Pereira
author_sort Margarida Gouveia
collection DOAJ
description Lung cancer stands as the most prevalent and deadliest type of cancer, with adenocarcinoma being the most common subtype. Computed Tomography (CT) is widely used for detecting tumours and their phenotype characteristics, for an early and accurate diagnosis that impacts patient outcomes. Machine learning algorithms have already shown the potential to recognize patterns in CT scans to classify the cancer subtype. In this work, two distinct pipelines were employed to perform binary classification between adenocarcinoma and non-adenocarcinoma. Firstly, radiomic features were classified by Random Forest and eXtreme Gradient Boosting classifiers. Next, a deep learning approach, based on a Residual Neural Network and a Transformer-based architecture, was utilised. Both 2D and 3D CT data were initially explored, with the Lung-PET-CT-Dx dataset being employed for training and the NSCLC-Radiomics and NSCLC-Radiogenomics datasets used for external evaluation. Overall, the 3D models outperformed the 2D ones, with the best result being achieved by the Hybrid Vision Transformer, with an AUC of 0.869 and a balanced accuracy of 0.816 on the internal test set. However, a lack of generalization capability was observed across all models, with the performances decreasing on the external test sets, a limitation that should be studied and addressed in future work.
format Article
id doaj-art-26ed2e91aa9241f58a90375bcc63238a
institution DOAJ
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-26ed2e91aa9241f58a90375bcc63238a2025-08-20T02:48:06ZengMDPI AGApplied Sciences2076-34172025-01-01153114810.3390/app15031148Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort StudyMargarida Gouveia0Tânia Mendes1Eduardo M. Rodrigues2Hélder P. Oliveira3Tania Pereira4Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), 4200-465 Porto, PortugalInstitute for Systems and Computer Engineering, Technology and Science (INESC TEC), 4200-465 Porto, PortugalInstitute for Systems and Computer Engineering, Technology and Science (INESC TEC), 4200-465 Porto, PortugalInstitute for Systems and Computer Engineering, Technology and Science (INESC TEC), 4200-465 Porto, PortugalInstitute for Systems and Computer Engineering, Technology and Science (INESC TEC), 4200-465 Porto, PortugalLung cancer stands as the most prevalent and deadliest type of cancer, with adenocarcinoma being the most common subtype. Computed Tomography (CT) is widely used for detecting tumours and their phenotype characteristics, for an early and accurate diagnosis that impacts patient outcomes. Machine learning algorithms have already shown the potential to recognize patterns in CT scans to classify the cancer subtype. In this work, two distinct pipelines were employed to perform binary classification between adenocarcinoma and non-adenocarcinoma. Firstly, radiomic features were classified by Random Forest and eXtreme Gradient Boosting classifiers. Next, a deep learning approach, based on a Residual Neural Network and a Transformer-based architecture, was utilised. Both 2D and 3D CT data were initially explored, with the Lung-PET-CT-Dx dataset being employed for training and the NSCLC-Radiomics and NSCLC-Radiogenomics datasets used for external evaluation. Overall, the 3D models outperformed the 2D ones, with the best result being achieved by the Hybrid Vision Transformer, with an AUC of 0.869 and a balanced accuracy of 0.816 on the internal test set. However, a lack of generalization capability was observed across all models, with the performances decreasing on the external test sets, a limitation that should be studied and addressed in future work.https://www.mdpi.com/2076-3417/15/3/1148adenocarcinomacomputed tomography scansdeep learningeXtreme gradient boostinglung cancer subtypemachine learning
spellingShingle Margarida Gouveia
Tânia Mendes
Eduardo M. Rodrigues
Hélder P. Oliveira
Tania Pereira
Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study
Applied Sciences
adenocarcinoma
computed tomography scans
deep learning
eXtreme gradient boosting
lung cancer subtype
machine learning
title Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study
title_full Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study
title_fullStr Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study
title_full_unstemmed Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study
title_short Comparing 2D and 3D Feature Extraction Methods for Lung Adenocarcinoma Prediction Using CT Scans: A Cross-Cohort Study
title_sort comparing 2d and 3d feature extraction methods for lung adenocarcinoma prediction using ct scans a cross cohort study
topic adenocarcinoma
computed tomography scans
deep learning
eXtreme gradient boosting
lung cancer subtype
machine learning
url https://www.mdpi.com/2076-3417/15/3/1148
work_keys_str_mv AT margaridagouveia comparing2dand3dfeatureextractionmethodsforlungadenocarcinomapredictionusingctscansacrosscohortstudy
AT taniamendes comparing2dand3dfeatureextractionmethodsforlungadenocarcinomapredictionusingctscansacrosscohortstudy
AT eduardomrodrigues comparing2dand3dfeatureextractionmethodsforlungadenocarcinomapredictionusingctscansacrosscohortstudy
AT helderpoliveira comparing2dand3dfeatureextractionmethodsforlungadenocarcinomapredictionusingctscansacrosscohortstudy
AT taniapereira comparing2dand3dfeatureextractionmethodsforlungadenocarcinomapredictionusingctscansacrosscohortstudy