A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning

Abstract Effective Breast cancer (BC) analysis is crucial for early prognosis, controlling cancer recurrence, timely medical intervention, and determining appropriate treatment procedures. Additionally, it plays a significant role in optimizing mortality rates among women with breast cancer and incr...

Full description

Saved in:
Bibliographic Details
Main Authors: E. Sreehari, L. D. Dhinesh Babu
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-87826-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862285192396800
author E. Sreehari
L. D. Dhinesh Babu
author_facet E. Sreehari
L. D. Dhinesh Babu
author_sort E. Sreehari
collection DOAJ
description Abstract Effective Breast cancer (BC) analysis is crucial for early prognosis, controlling cancer recurrence, timely medical intervention, and determining appropriate treatment procedures. Additionally, it plays a significant role in optimizing mortality rates among women with breast cancer and increasing the average lifespan of patients. This can be achieved by performing effective critical feature analysis of the BC by picking superlative features through significant ranking-based Feature Selection (FS). Various authors have developed strategies relying on single FS, but this approach may not yield excellent results and could lead to various consequences, including time and storage complexity issues, inaccurate results, poor decision-making, and difficult interpretation of models. Therefore, critical data analysis can facilitate the development of a robust ranking methodology for effective feature selection. To solve these problems, this paper suggests a new method called Aggregated Coefficient Ranking-based Feature Selection (ACRFS), which is based on tri chracteristic behavioral criteria. This strategy aims to significantly improve the ranking for an effective Attribute Subset Selection (ASSS). The proposed method utilized computational problem solvers such as chi-square, mutual information, correlation, and rank-dense methods. The work implemented the introduced methodology using Wisconsin-based breast cancer data and applied the Synthetic Minority Oversampling Technique (SMOTE) to the obtained data subset. Later, we employed models such as decision trees, support vector machines, k-nearest neighbors, random forests, stochastic gradient descent, and Gaussian naive bayes to determine the type of cancer. The classification metrics such as accuracy, precision, recall, F1 score, kappa score, and Matthews coefficient were utilized to evaluate the effectiveness of the suggested ACRFS approach. The proposed method has demonstrated superior outcomes with fewer features and a minimal time complexity.
format Article
id doaj-art-251c0ba9aa6b4eef805d571122f76b02
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-251c0ba9aa6b4eef805d571122f76b022025-02-09T12:35:32ZengNature PortfolioScientific Reports2045-23222025-02-0115111710.1038/s41598-025-87826-7A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learningE. Sreehari0L. D. Dhinesh Babu1School of Computer Science Engineering and Information Systems, Vellore Institute of TechnologySchool of Computer Science Engineering and Information Systems, Vellore Institute of TechnologyAbstract Effective Breast cancer (BC) analysis is crucial for early prognosis, controlling cancer recurrence, timely medical intervention, and determining appropriate treatment procedures. Additionally, it plays a significant role in optimizing mortality rates among women with breast cancer and increasing the average lifespan of patients. This can be achieved by performing effective critical feature analysis of the BC by picking superlative features through significant ranking-based Feature Selection (FS). Various authors have developed strategies relying on single FS, but this approach may not yield excellent results and could lead to various consequences, including time and storage complexity issues, inaccurate results, poor decision-making, and difficult interpretation of models. Therefore, critical data analysis can facilitate the development of a robust ranking methodology for effective feature selection. To solve these problems, this paper suggests a new method called Aggregated Coefficient Ranking-based Feature Selection (ACRFS), which is based on tri chracteristic behavioral criteria. This strategy aims to significantly improve the ranking for an effective Attribute Subset Selection (ASSS). The proposed method utilized computational problem solvers such as chi-square, mutual information, correlation, and rank-dense methods. The work implemented the introduced methodology using Wisconsin-based breast cancer data and applied the Synthetic Minority Oversampling Technique (SMOTE) to the obtained data subset. Later, we employed models such as decision trees, support vector machines, k-nearest neighbors, random forests, stochastic gradient descent, and Gaussian naive bayes to determine the type of cancer. The classification metrics such as accuracy, precision, recall, F1 score, kappa score, and Matthews coefficient were utilized to evaluate the effectiveness of the suggested ACRFS approach. The proposed method has demonstrated superior outcomes with fewer features and a minimal time complexity.https://doi.org/10.1038/s41598-025-87826-7
spellingShingle E. Sreehari
L. D. Dhinesh Babu
A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
Scientific Reports
title A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
title_full A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
title_fullStr A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
title_full_unstemmed A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
title_short A novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
title_sort novel aggregated coefficient ranking based feature selection strategy for enhancing the diagnosis of breast cancer classification using machine learning
url https://doi.org/10.1038/s41598-025-87826-7
work_keys_str_mv AT esreehari anovelaggregatedcoefficientrankingbasedfeatureselectionstrategyforenhancingthediagnosisofbreastcancerclassificationusingmachinelearning
AT lddhineshbabu anovelaggregatedcoefficientrankingbasedfeatureselectionstrategyforenhancingthediagnosisofbreastcancerclassificationusingmachinelearning
AT esreehari novelaggregatedcoefficientrankingbasedfeatureselectionstrategyforenhancingthediagnosisofbreastcancerclassificationusingmachinelearning
AT lddhineshbabu novelaggregatedcoefficientrankingbasedfeatureselectionstrategyforenhancingthediagnosisofbreastcancerclassificationusingmachinelearning