An integrated approach of feature selection and machine learning for early detection of breast cancer
Abstract Breast cancer ranks among the most prevalent cancers in women globally, with its treatment efficacy heavily reliant on the early identification and diagnosis of the disease. The importance of early detection and diagnosis cannot be overstated in enhancing the survival prospects of those aff...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-97685-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850146421215330304 |
|---|---|
| author | Jing Zhu Zhenhang Zhao Bangzheng Yin Canpeng Wu Chan Yin Rong Chen Youde Ding |
| author_facet | Jing Zhu Zhenhang Zhao Bangzheng Yin Canpeng Wu Chan Yin Rong Chen Youde Ding |
| author_sort | Jing Zhu |
| collection | DOAJ |
| description | Abstract Breast cancer ranks among the most prevalent cancers in women globally, with its treatment efficacy heavily reliant on the early identification and diagnosis of the disease. The importance of early detection and diagnosis cannot be overstated in enhancing the survival prospects of those afflicted with breast cancer. With the increasing application of machine learning technology in the medical field, algorithm-based diagnostic tools provide new possibilities for early prediction of breast cancer. In this study, we introduced a novel feature selection approach, which leverages Shapley additive explanation (SHAP) values as the basis for Recursive Feature Elimination (RFE), utilizing a Random Forest (RF) algorithm within the RFE framework. To address the data imbalance challenge, we incorporated Borderline-SMOTE1. The efficacy of the proposed method was assessed using five machine learning models, K-Nearest Neighbor (KNN), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Light Gradient Boosting Machine (LightGBM), applied to the Wisconsin Breast Cancer Diagnosis (WBCD) datasets. Optimizing hyperparameters of five models using the Particle Swarm Optimization (PSO) algorithm. In the datasets, 26 features were filtered using our recommended algorithm, the LightGBM-PSO model demonstrated an outstanding performance. The model demonstrated an impressive accuracy of 99.0% in differentiating between benign and malignant cases, boasting a specificity and precision of 100%, a recall rate of 97.40%, an F-measure of 98.68%, an AUC of 0.9870, and a 10-fold cross-validation accuracy of 0.9808. Subsequently, we developed a corresponding online tool (https://breast-cancer-prediction-tool-cgbjlhkns7yig6bmzvztmc.streamlit.app/) based on this model for predicting the risk of breast cancer. Feature selection using recommended algorithm and optimization of the LightGBM model through PSO can significantly enhance the accuracy of breast cancer prediction. This could potentially improve the prognosis for patients diagnosed with breast cancer. |
| format | Article |
| id | doaj-art-fdcb2a6a85f641518bcba71ff0c353e5 |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-fdcb2a6a85f641518bcba71ff0c353e52025-08-20T02:27:52ZengNature PortfolioScientific Reports2045-23222025-04-0115111210.1038/s41598-025-97685-xAn integrated approach of feature selection and machine learning for early detection of breast cancerJing Zhu0Zhenhang Zhao1Bangzheng Yin2Canpeng Wu3Chan Yin4Rong Chen5Youde Ding6Experimental Centre, Guangzhou UniversityElectronics and Communication Engineering, Guangzhou UniversityInstitute of Information Engineering, Guangzhou Railway PolytechnicElectronics and Communication Engineering, Guangzhou UniversityThe Central Hospital of ShaoyangThe Central Hospital of ShaoyangSchool of Biomedical Engineering, Guangzhou Medical UniversityAbstract Breast cancer ranks among the most prevalent cancers in women globally, with its treatment efficacy heavily reliant on the early identification and diagnosis of the disease. The importance of early detection and diagnosis cannot be overstated in enhancing the survival prospects of those afflicted with breast cancer. With the increasing application of machine learning technology in the medical field, algorithm-based diagnostic tools provide new possibilities for early prediction of breast cancer. In this study, we introduced a novel feature selection approach, which leverages Shapley additive explanation (SHAP) values as the basis for Recursive Feature Elimination (RFE), utilizing a Random Forest (RF) algorithm within the RFE framework. To address the data imbalance challenge, we incorporated Borderline-SMOTE1. The efficacy of the proposed method was assessed using five machine learning models, K-Nearest Neighbor (KNN), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Light Gradient Boosting Machine (LightGBM), applied to the Wisconsin Breast Cancer Diagnosis (WBCD) datasets. Optimizing hyperparameters of five models using the Particle Swarm Optimization (PSO) algorithm. In the datasets, 26 features were filtered using our recommended algorithm, the LightGBM-PSO model demonstrated an outstanding performance. The model demonstrated an impressive accuracy of 99.0% in differentiating between benign and malignant cases, boasting a specificity and precision of 100%, a recall rate of 97.40%, an F-measure of 98.68%, an AUC of 0.9870, and a 10-fold cross-validation accuracy of 0.9808. Subsequently, we developed a corresponding online tool (https://breast-cancer-prediction-tool-cgbjlhkns7yig6bmzvztmc.streamlit.app/) based on this model for predicting the risk of breast cancer. Feature selection using recommended algorithm and optimization of the LightGBM model through PSO can significantly enhance the accuracy of breast cancer prediction. This could potentially improve the prognosis for patients diagnosed with breast cancer.https://doi.org/10.1038/s41598-025-97685-xBreast cancerLightGBMSHAPBorderline-SMOTE1RFEPSO |
| spellingShingle | Jing Zhu Zhenhang Zhao Bangzheng Yin Canpeng Wu Chan Yin Rong Chen Youde Ding An integrated approach of feature selection and machine learning for early detection of breast cancer Scientific Reports Breast cancer LightGBM SHAP Borderline-SMOTE1 RFE PSO |
| title | An integrated approach of feature selection and machine learning for early detection of breast cancer |
| title_full | An integrated approach of feature selection and machine learning for early detection of breast cancer |
| title_fullStr | An integrated approach of feature selection and machine learning for early detection of breast cancer |
| title_full_unstemmed | An integrated approach of feature selection and machine learning for early detection of breast cancer |
| title_short | An integrated approach of feature selection and machine learning for early detection of breast cancer |
| title_sort | integrated approach of feature selection and machine learning for early detection of breast cancer |
| topic | Breast cancer LightGBM SHAP Borderline-SMOTE1 RFE PSO |
| url | https://doi.org/10.1038/s41598-025-97685-x |
| work_keys_str_mv | AT jingzhu anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT zhenhangzhao anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT bangzhengyin anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT canpengwu anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT chanyin anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT rongchen anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT youdeding anintegratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT jingzhu integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT zhenhangzhao integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT bangzhengyin integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT canpengwu integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT chanyin integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT rongchen integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer AT youdeding integratedapproachoffeatureselectionandmachinelearningforearlydetectionofbreastcancer |