Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study
Abstract Background Primary liver cancer is the sixth most common cancer globally and ranks third in cancer-related mortality. Patients with distant metastasis (PLCDM) have particularly low survival rates and are more difficult to treat. This study aims to identify risk factors associated with dista...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Discover Oncology |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s12672-025-02894-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850207048383332352 |
|---|---|
| author | Cong Lu Ying He Chun-Ru Chen Lun Wu Dan Song Chen-Hong Wang Le-Qing Zhang Jing-Yi Miao Yong-Bin Zheng Wei Wang |
| author_facet | Cong Lu Ying He Chun-Ru Chen Lun Wu Dan Song Chen-Hong Wang Le-Qing Zhang Jing-Yi Miao Yong-Bin Zheng Wei Wang |
| author_sort | Cong Lu |
| collection | DOAJ |
| description | Abstract Background Primary liver cancer is the sixth most common cancer globally and ranks third in cancer-related mortality. Patients with distant metastasis (PLCDM) have particularly low survival rates and are more difficult to treat. This study aims to identify risk factors associated with distant metastasis and overall survival (OS) in primary liver cancer and to determine the optimal predictive models using machine learning. Methods We extracted data from the SEER database (Incidence—SEER Research Data, 17 Registries, Nov 2022 Sub (2000–2020)) and identified risk factors for distant metastasis using logistic regression. Eight machine learning models were constructed using the “tidymodels” package in R and evaluated based on ROC curves, AUC, and accuracy. Cox regression was used to identify risk factors for OS, and Cox and Random Survival Forest (RSF) models were compared using time-dependent ROC curves. The best-performing model was interpreted using Shapley analysis. We also developed user-friendly web applications using the “shiny” package in R for clinical use. Results Multivariate analysis identified grade, T stage, N stage, tumor size, and surgery as independent risk factors for PLCDM. The Random Forest (RF) model showed the best performance with AUC values of 0.836, 0.817, and 0.846 in the training, internal validation, and external validation cohorts, respectively, and favorable Brier scores and accuracy. Shapley analysis ranked the risk factors by contribution as surgery, T stage, tumor size, N stage, and grade. Cox regression identified grade, surgery, and T stage as independent prognostic factors for OS. The Cox model outperformed the RSF model in time-dependent ROC analysis. Calibration and decision curve analysis (DCA) further confirmed its strong predictive performance and clinical utility. Shapley analysis ranked the risk factors as grade, surgery, and T stage. Conclusions We successfully constructed and validated optimal models for predicting PLCDM and its prognosis. These models provide valuable tools to guide clinical decision-making for PLCDM. |
| format | Article |
| id | doaj-art-e9d227c276cd498b8bb7b048fd9e2757 |
| institution | OA Journals |
| issn | 2730-6011 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Oncology |
| spelling | doaj-art-e9d227c276cd498b8bb7b048fd9e27572025-08-20T02:10:38ZengSpringerDiscover Oncology2730-60112025-06-0116111410.1007/s12672-025-02894-5Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based studyCong Lu0Ying He1Chun-Ru Chen2Lun Wu3Dan Song4Chen-Hong Wang5Le-Qing Zhang6Jing-Yi Miao7Yong-Bin Zheng8Wei Wang9Department of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversityDepartment of Stomatology, Renmin Hospital of Wuhan UniversityDepartment of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversitySinopharm Dongfeng General Hospital, Hubei University of MedicineDepartment of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversityDepartment of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversityDepartment of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversityDepartment of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversityDepartment of Gastrointestinal Surgery, Renmin Hospital of Wuhan UniversityDepartment of Hepatobiliary Surgery, East Hospital, Renmin Hospital of Wuhan UniversityAbstract Background Primary liver cancer is the sixth most common cancer globally and ranks third in cancer-related mortality. Patients with distant metastasis (PLCDM) have particularly low survival rates and are more difficult to treat. This study aims to identify risk factors associated with distant metastasis and overall survival (OS) in primary liver cancer and to determine the optimal predictive models using machine learning. Methods We extracted data from the SEER database (Incidence—SEER Research Data, 17 Registries, Nov 2022 Sub (2000–2020)) and identified risk factors for distant metastasis using logistic regression. Eight machine learning models were constructed using the “tidymodels” package in R and evaluated based on ROC curves, AUC, and accuracy. Cox regression was used to identify risk factors for OS, and Cox and Random Survival Forest (RSF) models were compared using time-dependent ROC curves. The best-performing model was interpreted using Shapley analysis. We also developed user-friendly web applications using the “shiny” package in R for clinical use. Results Multivariate analysis identified grade, T stage, N stage, tumor size, and surgery as independent risk factors for PLCDM. The Random Forest (RF) model showed the best performance with AUC values of 0.836, 0.817, and 0.846 in the training, internal validation, and external validation cohorts, respectively, and favorable Brier scores and accuracy. Shapley analysis ranked the risk factors by contribution as surgery, T stage, tumor size, N stage, and grade. Cox regression identified grade, surgery, and T stage as independent prognostic factors for OS. The Cox model outperformed the RSF model in time-dependent ROC analysis. Calibration and decision curve analysis (DCA) further confirmed its strong predictive performance and clinical utility. Shapley analysis ranked the risk factors as grade, surgery, and T stage. Conclusions We successfully constructed and validated optimal models for predicting PLCDM and its prognosis. These models provide valuable tools to guide clinical decision-making for PLCDM.https://doi.org/10.1007/s12672-025-02894-5Machine learningPrimary liver cancerDistant metastasisSEER |
| spellingShingle | Cong Lu Ying He Chun-Ru Chen Lun Wu Dan Song Chen-Hong Wang Le-Qing Zhang Jing-Yi Miao Yong-Bin Zheng Wei Wang Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study Discover Oncology Machine learning Primary liver cancer Distant metastasis SEER |
| title | Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study |
| title_full | Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study |
| title_fullStr | Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study |
| title_full_unstemmed | Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study |
| title_short | Development and validation of machine learning models for distant metastasis of primary hepatic carcinoma: a population-based study |
| title_sort | development and validation of machine learning models for distant metastasis of primary hepatic carcinoma a population based study |
| topic | Machine learning Primary liver cancer Distant metastasis SEER |
| url | https://doi.org/10.1007/s12672-025-02894-5 |
| work_keys_str_mv | AT conglu developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT yinghe developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT chunruchen developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT lunwu developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT dansong developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT chenhongwang developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT leqingzhang developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT jingyimiao developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT yongbinzheng developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy AT weiwang developmentandvalidationofmachinelearningmodelsfordistantmetastasisofprimaryhepaticcarcinomaapopulationbasedstudy |