Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer

Background Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Inde...

Full description

Saved in:
Bibliographic Details
Main Authors: Quan Yuan, Rongjie Ye, Yao Qian, Hao Yu, Yuexin Zhou, Xiaoqiao Cui, Feng Liu, Ming Niu
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Annals of Medicine
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/07853890.2025.2541316
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849344362246307840
author Quan Yuan
Rongjie Ye
Yao Qian
Hao Yu
Yuexin Zhou
Xiaoqiao Cui
Feng Liu
Ming Niu
author_facet Quan Yuan
Rongjie Ye
Yao Qian
Hao Yu
Yuexin Zhou
Xiaoqiao Cui
Feng Liu
Ming Niu
author_sort Quan Yuan
collection DOAJ
description Background Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Index (ALI) into such a model to evaluate its effectiveness.Methods Data from 3,036 female BC patients receiving NACT at Heilongjiang Provincial Tumor Hospital (2008–2019, median follow-up 7.28 years) were analyzed. After screening, 2,909 patients were randomized into training and validation cohorts (7:3). Using eXtreme Gradient Boosting (XGBoost), Gradient Boosting Classifier (GBC), Support Vector Machine (SVM) models, and SHapley Additive exPlanations (SHAP), the best predicting pathological complete response (pCR) model was identified, and key features were interpreted. The Least Absolute Shrinkage and Selection Operator (LASSO) Cox algorithm, combined with XGBoost and Random Forest (RF) models, identified 9 overlapping prognostic features, enhancing the nomogram’s predictive accuracy for overall survival (OS). Kaplan–Meier (KM) analysis revealed varying prognostic outcomes.Results The XGBoost model performed best in predicting pCR, with Area Under Curve (AUC) values of 0.88 and 0.72 in the training and validation sets, respectively. SHAP analysis indicated that ER, HER2 status, ALI, and albumin (Alb) level were the four most important features. The prognostic model was also validated by high AUC values in both training and test sets. KM analysis indicated that lower ALI, non-pCR, and triple-negative BC manifested as worse clinical outcomes. However, the adverse impact of ALI on the prognosis of this cohort was mainly reflected in the long-term recurrence outcomes and non-pCR groups.Conclusion This study is the first to introduce ALI into the prediction model for BC completing NACT and develop a large-sample model based on XGBoost. Owing to the particularity of the indicators, training and validation were conducted on real clinical data.
format Article
id doaj-art-d2a5115830224592b9d1bb4cba333440
institution Kabale University
issn 0785-3890
1365-2060
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series Annals of Medicine
spelling doaj-art-d2a5115830224592b9d1bb4cba3334402025-08-20T03:42:40ZengTaylor & Francis GroupAnnals of Medicine0785-38901365-20602025-12-0157110.1080/07853890.2025.2541316Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancerQuan Yuan0Rongjie Ye1Yao Qian2Hao Yu3Yuexin Zhou4Xiaoqiao Cui5Feng Liu6Ming Niu7Department of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaQuanzhou First Hospital Affiliated to Fujian Medical University, Quanzhou, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaSchool of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaDepartment of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, ChinaBackground Most models of neoadjuvant chemotherapy (NACT) for breast cancer (BC) suffer from insufficient data and lack interpretability. Additionally, there is a notable absence of reports from China in this field. This study is also the first to integrate the Advanced Lung Cancer Inflammation Index (ALI) into such a model to evaluate its effectiveness.Methods Data from 3,036 female BC patients receiving NACT at Heilongjiang Provincial Tumor Hospital (2008–2019, median follow-up 7.28 years) were analyzed. After screening, 2,909 patients were randomized into training and validation cohorts (7:3). Using eXtreme Gradient Boosting (XGBoost), Gradient Boosting Classifier (GBC), Support Vector Machine (SVM) models, and SHapley Additive exPlanations (SHAP), the best predicting pathological complete response (pCR) model was identified, and key features were interpreted. The Least Absolute Shrinkage and Selection Operator (LASSO) Cox algorithm, combined with XGBoost and Random Forest (RF) models, identified 9 overlapping prognostic features, enhancing the nomogram’s predictive accuracy for overall survival (OS). Kaplan–Meier (KM) analysis revealed varying prognostic outcomes.Results The XGBoost model performed best in predicting pCR, with Area Under Curve (AUC) values of 0.88 and 0.72 in the training and validation sets, respectively. SHAP analysis indicated that ER, HER2 status, ALI, and albumin (Alb) level were the four most important features. The prognostic model was also validated by high AUC values in both training and test sets. KM analysis indicated that lower ALI, non-pCR, and triple-negative BC manifested as worse clinical outcomes. However, the adverse impact of ALI on the prognosis of this cohort was mainly reflected in the long-term recurrence outcomes and non-pCR groups.Conclusion This study is the first to introduce ALI into the prediction model for BC completing NACT and develop a large-sample model based on XGBoost. Owing to the particularity of the indicators, training and validation were conducted on real clinical data.https://www.tandfonline.com/doi/10.1080/07853890.2025.2541316Breast cancerneoadjuvant chemotherapyadvanced lung cancer inflammation indexmachine learningSHapley Additive ex Planations
spellingShingle Quan Yuan
Rongjie Ye
Yao Qian
Hao Yu
Yuexin Zhou
Xiaoqiao Cui
Feng Liu
Ming Niu
Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
Annals of Medicine
Breast cancer
neoadjuvant chemotherapy
advanced lung cancer inflammation index
machine learning
SHapley Additive ex Planations
title Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
title_full Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
title_fullStr Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
title_full_unstemmed Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
title_short Machine learning and SHAP value interpretation for predicting the response to neoadjuvant chemotherapy and long-term clinical outcomes in Chinese female breast cancer
title_sort machine learning and shap value interpretation for predicting the response to neoadjuvant chemotherapy and long term clinical outcomes in chinese female breast cancer
topic Breast cancer
neoadjuvant chemotherapy
advanced lung cancer inflammation index
machine learning
SHapley Additive ex Planations
url https://www.tandfonline.com/doi/10.1080/07853890.2025.2541316
work_keys_str_mv AT quanyuan machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT rongjieye machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT yaoqian machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT haoyu machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT yuexinzhou machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT xiaoqiaocui machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT fengliu machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer
AT mingniu machinelearningandshapvalueinterpretationforpredictingtheresponsetoneoadjuvantchemotherapyandlongtermclinicaloutcomesinchinesefemalebreastcancer