Deep learning with data transformation improves cancer risk prediction in oral precancerous conditions

Background: Oral cancer is the most common head and neck malignancy and may develop from oral leukoplakia (OL) and oral lichenoid disease (OLD). Machine learning classifiers using structured (tabular) data have been employed to predict malignant transformation in OL and OLD. However, current models...

Full description

Saved in:
Bibliographic Details
Main Authors: John Adeoye, Yuxiong Su
Format: Article
Language:English
Published: Elsevier 2025-05-01
Series:Intelligent Medicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667102625000300
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Oral cancer is the most common head and neck malignancy and may develop from oral leukoplakia (OL) and oral lichenoid disease (OLD). Machine learning classifiers using structured (tabular) data have been employed to predict malignant transformation in OL and OLD. However, current models require improved discrimination, and their frameworks may limit feature fusion and multimodal risk prediction. Therefore, this study investigates whether tabular-to-image data conversion and deep learning (DL) based on convolutional neural networks (CNNs) can improve malignant transformation prediction compared to traditional classifiers. Methods: This study used retrospective data of 1,010 patients with OL and OLD treated at Queen Mary Hospital, Hong Kong, from January 2003 to December 2023, to construct artificial intelligence-based models for oral cancer risk stratification in OL/OLD. Twenty-five input features and information on oral cancer development in OL/OLD were retrieved from electronic health records. Tabular-to-2D image data transformation was achieved by creating a feature matrix from encoded labels of the input variables arranged according to their correlation coefficient. Then, 2D images were used to populate five pre-trained DL models (VGG16, VGG19, MobileNetV2, ResNet50, and EfficientNet-B0). Area under the receiver operating characteristic curve (AUC), Brier scores, and net benefit of the DL models were calculated and compared to five traditional classifiers based on structured data and the binary epithelial dysplasia grading system (current method). Results: This study found that the DL models had better AUC values (0.893–0.955) and Brier scores (0.072–0.106) compared to the traditional classifiers (AUC: 0.887–0.941 and Brier score: 0.074–0.136) during validation. During internal testing, VGG16 and VGG19 had better AUC values and Brier scores than other CNNs (AUC: 0.998–1.00; Brier score: 0.036–0.044) and the best traditional classifier (random forest) (AUC: 0.906; Brier score: 0.153). Additionally, VGG16 and VGG19 models outperformed random forest in discrimination and calibration during external testing (AUC: 1.00 vs. 0.976; Brier score: 0.022–0.034 vs. 0.129). The best CNNs also had better discriminatory performance and calibration than binary dysplasia grading at internal and external testing. Overall, decision curve analysis showed that the optimal DL models with transformed data had a higher net benefit than random forest and binary dysplasia grading. Conclusion: Tabular-to-2D image data transformation may improve the use of structured input features for developing optimal intelligent models for oral cancer risk prediction in OL and OLD using convolutional networks. This approach may have the potential to robustly handle structured data in multimodal DL frameworks for oncological outcome prediction.
ISSN:2667-1026