An explainable machine learning model for predicting the risk of distant metastasis in intrahepatic cholangiocarcinoma: a population-based cohort study
Abstract Background Distant metastasis (DM) in intrahepatic cholangiocarcinoma (ICC) is associated with poor prognosis and significantly high mortality. Therefore, developing an effective early prediction method for DM risk is crucial for tailoring personalized treatment plans and improving patient...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Discover Oncology |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s12672-025-02952-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Background Distant metastasis (DM) in intrahepatic cholangiocarcinoma (ICC) is associated with poor prognosis and significantly high mortality. Therefore, developing an effective early prediction method for DM risk is crucial for tailoring personalized treatment plans and improving patient outcomes. Methods This study included data from eligible ICC patients collected from the Surveillance, Epidemiology, and End Results (SEER) database between 2004 and 2021. Feature selection was performed using three methods, including least absolute shrinkage and selection operator (LASSO) regression, the Boruta algorithm, and recursive feature elimination (RFE). Eight machine learning (ML) algorithms were used to develop predictive models. Model performance was evaluated and compared using metrics such as the area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AUPRC), decision curve analysis (DCA), and calibration curves. The SHapley Additive exPlanations (SHAP) method was applied to rank feature importance and interpret the final model. Result This study included 8536 ICC patients, including 2816 (33%) with DM. The intersection results of the three feature selection methods identified 10 predictive factors. Among the 8 ML models, the gradient boosting machine (GBM) model achieved the highest AUC (0.802), AUPRC (0.571), and accuracy (0.713), as well as the lowest Brier score (0.177), indicating a comparatively robust overall performance. Calibration curves and DCA indicated that the GBM model has good clinical decision-making capability and predictive performance. SHAP analysis identified the top 10 most relevant features, ranked by relative importance: surgery, N stage, tumor grade, T stage, tumor size, radiotherapy, tumor number, age at diagnosis, chemotherapy, and number of resected lymph nodes (LNs). Additionally, a web-based online calculator was developed to predict the risk of DM in ICC patients, available at https://bijinzhe.shinyapps.io/icc_dm_shiny/ . Conclusion The GBM model demonstrated considerable potential in predicting the risk of DM in ICC patients. This could assist clinicians in formulating personalized treatment strategies, ultimately improving the overall prognosis of ICC patients. |
|---|---|
| ISSN: | 2730-6011 |