Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
Background Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Inde...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-12-01
|
| Series: | Annals of Medicine |
| Subjects: | |
| Online Access: | https://www.tandfonline.com/doi/10.1080/07853890.2025.2541316 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849344362246307840 |
|---|---|
| author | Quan Yuan Rongjie Ye Yao Qian Hao Yu Yuexin Zhou Xiaoqiao Cui Feng Liu Ming Niu |
| author_facet | Quan Yuan Rongjie Ye Yao Qian Hao Yu Yuexin Zhou Xiaoqiao Cui Feng Liu Ming Niu |
| author_sort | Quan Yuan |
| collection | DOAJ |
| description | Background Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Index (ALI) into such a model to evaluate its effectiveness.Methods Data from 3,036 female BC patients receiving NACT at Heilongjiang Provincial Tumor Hospital (2008–2019, median follow-up 7.28 years) were analyzed. After screening, 2,909 patients were randomized into training and validation cohorts (7:3). Using eXtreme Gradient Boosting (XGBoost), Gradient Boosting Classifier (GBC), Support Vector Machine (SVM) models, and SHapley Additive exPlanations (SHAP), the best predicting pathological complete response (pCR) model was identified, and key features were interpreted. The Least Absolute Shrinkage and Selection Operator (LASSO) Cox algorithm, combined with XGBoost and Random Forest (RF) models, identified 9 overlapping prognostic features, enhancing the nomogram’s predictive accuracy for overall survival (OS). Kaplan–Meier (KM) analysis revealed varying prognostic outcomes.Results The XGBoost model performed best in predicting pCR, with Area Under Curve (AUC) values of 0.88 and 0.72 in the training and validation sets, respectively. SHAP analysis indicated that ER, HER2 status, ALI, and albumin (Alb) level were the four most important features. The prognostic model was also validated by high AUC values in both training and test sets. KM analysis indicated that lower ALI, non-pCR, and triple-negative BC manifested as worse clinical outcomes. However, the adverse impact of ALI on the prognosis of this cohort was mainly reflected in the long-term recurrence outcomes and non-pCR groups.Conclusion This study is the first to introduce ALI into the prediction model for BC completing NACT and develop a large-sample model based on XGBoost. Owing to the particularity of the indicators, training and validation were conducted on real clinical data. |
| format | Article |
| id | doaj-art-d2a5115830224592b9d1bb4cba333440 |
| institution | Kabale University |
| issn | 0785-3890 1365-2060 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | Annals of Medicine |
| spelling | doaj-art-d2a5115830224592b9d1bb4cba3334402025-08-20T03:42:40ZengTaylor & Francis GroupAnnals of Medicine0785-38901365-20602025-12-0157110.1080/07853890.2025.2541316Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancerQuan Yuan0Rongjie Ye1Yao Qian2Hao Yu3Yuexin Zhou4Xiaoqiao Cui5Feng Liu6Ming Niu7Department of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaQuanzhou First Hospital Affiliated to Fujian Medical University, Quanzhou, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaSchool of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaBackground Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Index (ALI) into such a model to evaluate its effectiveness.Methods Data from 3,036 female BC patients receiving NACT at Heilongjiang Provincial Tumor Hospital (2008–2019, median follow-up 7.28 years) were analyzed. After screening, 2,909 patients were randomized into training and validation cohorts (7:3). Using eXtreme Gradient Boosting (XGBoost), Gradient Boosting Classifier (GBC), Support Vector Machine (SVM) models, and SHapley Additive exPlanations (SHAP), the best predicting pathological complete response (pCR) model was identified, and key features were interpreted. The Least Absolute Shrinkage and Selection Operator (LASSO) Cox algorithm, combined with XGBoost and Random Forest (RF) models, identified 9 overlapping prognostic features, enhancing the nomogram’s predictive accuracy for overall survival (OS). Kaplan–Meier (KM) analysis revealed varying prognostic outcomes.Results The XGBoost model performed best in predicting pCR, with Area Under Curve (AUC) values of 0.88 and 0.72 in the training and validation sets, respectively. SHAP analysis indicated that ER, HER2 status, ALI, and albumin (Alb) level were the four most important features. The prognostic model was also validated by high AUC values in both training and test sets. KM analysis indicated that lower ALI, non-pCR, and triple-negative BC manifested as worse clinical outcomes. However, the adverse impact of ALI on the prognosis of this cohort was mainly reflected in the long-term recurrence outcomes and non-pCR groups.Conclusion This study is the first to introduce ALI into the prediction model for BC completing NACT and develop a large-sample model based on XGBoost. Owing to the particularity of the indicators, training and validation were conducted on real clinical data.https://www.tandfonline.com/doi/10.1080/07853890.2025.2541316Breast cancerneoadjuvant chemotherapyadvanced lung cancer inflammation indexmachine learningSHapley Additive ex Planations |
| spellingShingle | Quan Yuan Rongjie Ye Yao Qian Hao Yu Yuexin Zhou Xiaoqiao Cui Feng Liu Ming Niu Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer Annals of Medicine Breast cancer neoadjuvant chemotherapy advanced lung cancer inflammation index machine learning SHapley Additive ex Planations |
| title | Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer |
| title_full | Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer |
| title_fullStr | Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer |
| title_full_unstemmed | Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer |
| title_short | Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer |
| title_sort | machine learning and shap value interpretation for predicting the response to neoadjuvant chemotherapy and long term clinical outcomes in chinese female breast cancer |
| topic | Breast cancer neoadjuvant chemotherapy advanced lung cancer inflammation index machine learning SHapley Additive ex Planations |
| url | https://www.tandfonline.com/doi/10.1080/07853890.2025.2541316 |
| work_keys_str_mv | AT quanyuan machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT rongjieye machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT yaoqian machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT haoyu machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT yuexinzhou machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT xiaoqiaocui machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT fengliu machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer AT mingniu machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer |