Integrative machine learning model for subtype identification and prognostic prediction in lung squamous cell carcinoma

Abstract Background Lung squamous cell carcinoma (LUSC) is a leading cause of cancer-related mortality, and tumor heterogeneity could result in diverse prognostic subtypes. Traditional prognostic factors, like tumor, node, and metastasis (TNM) staging, offer limited predictive accuracy. This study a...

Full description

Saved in:
Bibliographic Details
Main Authors: Guangliang Duan, Qi Huo, Wei Ni, Fei Ding, Yuefang Ye, Tingting Tang, Huiping Dai
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Oncology
Subjects:
Online Access:https://doi.org/10.1007/s12672-025-02560-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Lung squamous cell carcinoma (LUSC) is a leading cause of cancer-related mortality, and tumor heterogeneity could result in diverse prognostic subtypes. Traditional prognostic factors, like tumor, node, and metastasis (TNM) staging, offer limited predictive accuracy. This study aims to identify LUSC subtypes and develop predictive models that have the potential to improve prognosis prediction accuracy and support personalized treatment. Methods Expression and clinical data were collected from three datasets. One dataset (TCGA-LUSC) was used as a training set, while the others (GSE30219 and GSE73403) were independent testing sets. Unsupervised clustering was applied to the training set to identify LUSC subtypes. The relationship between survival outcomes and these identified subtypes was validated in the testing sets using binary machine learning models and survival curve analysis. The impact of chemotherapy on the prognosis for subtypes was also presented. Subsequently, four survival machine learning models were developed to predict LUSC prognosis. These models were validated in the testing sets and integrated into an online tool to assist in survival prediction. Results Two subtypes, C1 and C2, were identified in the training set. The C1 subtype was associated with poorer survival outcomes and was enriched in cancer-associated fibroblasts and macrophages. In contrast, the C2 subtype correlated with better outcomes and was enriched in CD8 + T cells. Regarding chemotherapy, the C2 subtype with chemotherapy showed the best survival outcomes compared to other groups. A 9-gene signature was derived from the model’s importance values for subtype prediction and included TGM2, AOC3, TBXA2R, RGS3, DLC1, MMP19, ACVRL1, TCF21, and TIMP3. This signature outperformed 14 published signatures and clinical variables at survival prediction with the highest time-dependent AUC (tdAUC) and concordance index (C-index). Four machine learning models were developed using this signature, achieving tdAUC values of 0.712 and 0.684 and C-index values of 0.682 and 0.625 in the independent testing sets. An online tool for predicting survival probabilities for LUSC patients up to 10 years post-treatment is available at https://hznuduan.shinyapps.io/LCSP/ . Conclusion We identified two LUSC subtypes by unsupervised clustering and developed an online tool for prognosis prediction using supervised machine learning models.
ISSN:2730-6011