Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models
Abstract Background Breast cancer is a prevalent malignancy globally, with approximately 1 in 10 breast cancer patients at risk of developing additional primary malignant tumors. This study seeks to explore the risk factors linked to the development of multiple primary cancers (MPCs) in breast cance...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | BMC Medical Informatics and Decision Making |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12911-025-03086-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849333069640630272 |
|---|---|
| author | Yudi Jin Tong Su Yanjia Fan Yineng Zheng Cheng Tian Zubin Ouyang Fajin Lv |
| author_facet | Yudi Jin Tong Su Yanjia Fan Yineng Zheng Cheng Tian Zubin Ouyang Fajin Lv |
| author_sort | Yudi Jin |
| collection | DOAJ |
| description | Abstract Background Breast cancer is a prevalent malignancy globally, with approximately 1 in 10 breast cancer patients at risk of developing additional primary malignant tumors. This study seeks to explore the risk factors linked to the development of multiple primary cancers (MPCs) in breast cancer patients and to develop predictive models to aid in clinical decision-making. Methods A cohort of patients from the Surveillance, Epidemiology, and End Results (SEER) database was analyzed to identify key factors contributing to the occurrence of MPCs. Machine learning models, including logistic regression and random forest, were established and tested to predict the risk of developing multiple primary cancers. Results A total of 120,434 breast cancer patients were included in the study. After random undersampling of the majority calss and random selected a quarter of populations, there were 3432 patients in each of the one primary breast cancer (OPBC) group and the MPCs group. A logistic regression and a random forest model were constructed based on age, marital status, laterality, histological type, tumor grade, American Joint Committee on Cance (AJCC) stage, T and N stage, molecular subtype, surgery, chemotherapy, and radiotherapy. The logistic regression model achieved an area under the curve (AUC) of 0.902, a specificity of 0.905, and a sensitivity of 0.767 in the training set, and an AUC of 0.886, a specificity of 0.882, and a sensitivity of 0.782 In the testing set. The random forest model achieved an AUC of 0.955, a specificity of 0.916, and a sensitivity of 0.859 in the training set, and an AUC of 0.874, a specificity of 0.858, and a sensitivity of 0.769 in the testing set. A nomogram was plotted based on the logistic regression model. The Kaplan-Meier (K-M) curves demonstrated statistically significant differences in prognosis among the various risk groups that were stratified based on the nomogram. Conclusions This study assessed several risk factors influencing the development of MPCs in breast cancer patients. The machine learning model could offer a practical tool for personalized risk assessment in this patient population. |
| format | Article |
| id | doaj-art-5cc4ac97037042db8dc7c09f1ffbd994 |
| institution | Kabale University |
| issn | 1472-6947 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Informatics and Decision Making |
| spelling | doaj-art-5cc4ac97037042db8dc7c09f1ffbd9942025-08-20T03:46:00ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111510.1186/s12911-025-03086-5Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning modelsYudi Jin0Tong Su1Yanjia Fan2Yineng Zheng3Cheng Tian4Zubin Ouyang5Fajin Lv6Department of Radiology, The First Affiliated Hospital of Chongqing Medical UniversityDepartment of Radiology, The First Affiliated Hospital of Chongqing Medical UniversityDepartment of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical UniversityDepartment of Radiology, The First Affiliated Hospital of Chongqing Medical UniversityDepartment of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical UniversityDepartment of Radiology, The First Affiliated Hospital of Chongqing Medical UniversityDepartment of Radiology, The First Affiliated Hospital of Chongqing Medical UniversityAbstract Background Breast cancer is a prevalent malignancy globally, with approximately 1 in 10 breast cancer patients at risk of developing additional primary malignant tumors. This study seeks to explore the risk factors linked to the development of multiple primary cancers (MPCs) in breast cancer patients and to develop predictive models to aid in clinical decision-making. Methods A cohort of patients from the Surveillance, Epidemiology, and End Results (SEER) database was analyzed to identify key factors contributing to the occurrence of MPCs. Machine learning models, including logistic regression and random forest, were established and tested to predict the risk of developing multiple primary cancers. Results A total of 120,434 breast cancer patients were included in the study. After random undersampling of the majority calss and random selected a quarter of populations, there were 3432 patients in each of the one primary breast cancer (OPBC) group and the MPCs group. A logistic regression and a random forest model were constructed based on age, marital status, laterality, histological type, tumor grade, American Joint Committee on Cance (AJCC) stage, T and N stage, molecular subtype, surgery, chemotherapy, and radiotherapy. The logistic regression model achieved an area under the curve (AUC) of 0.902, a specificity of 0.905, and a sensitivity of 0.767 in the training set, and an AUC of 0.886, a specificity of 0.882, and a sensitivity of 0.782 In the testing set. The random forest model achieved an AUC of 0.955, a specificity of 0.916, and a sensitivity of 0.859 in the training set, and an AUC of 0.874, a specificity of 0.858, and a sensitivity of 0.769 in the testing set. A nomogram was plotted based on the logistic regression model. The Kaplan-Meier (K-M) curves demonstrated statistically significant differences in prognosis among the various risk groups that were stratified based on the nomogram. Conclusions This study assessed several risk factors influencing the development of MPCs in breast cancer patients. The machine learning model could offer a practical tool for personalized risk assessment in this patient population.https://doi.org/10.1186/s12911-025-03086-5Breast cancerMultiple primary cancerRisk factorsRandom forestLogistic regression |
| spellingShingle | Yudi Jin Tong Su Yanjia Fan Yineng Zheng Cheng Tian Zubin Ouyang Fajin Lv Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models BMC Medical Informatics and Decision Making Breast cancer Multiple primary cancer Risk factors Random forest Logistic regression |
| title | Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models |
| title_full | Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models |
| title_fullStr | Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models |
| title_full_unstemmed | Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models |
| title_short | Risk factors of breast cancer patients developing multiple primary cancers: a retrospective study and establishing/testing of machine learning models |
| title_sort | risk factors of breast cancer patients developing multiple primary cancers a retrospective study and establishing testing of machine learning models |
| topic | Breast cancer Multiple primary cancer Risk factors Random forest Logistic regression |
| url | https://doi.org/10.1186/s12911-025-03086-5 |
| work_keys_str_mv | AT yudijin riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels AT tongsu riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels AT yanjiafan riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels AT yinengzheng riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels AT chengtian riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels AT zubinouyang riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels AT fajinlv riskfactorsofbreastcancerpatientsdevelopingmultipleprimarycancersaretrospectivestudyandestablishingtestingofmachinelearningmodels |