Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
Breast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, a...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0326221 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850207972219682816 |
|---|---|
| author | Kazi Arman Ahmed Israt Humaira Ashiqur Rahman Khan Md Shamim Hasan Mukitul Islam Anik Roy Mehrab Karim Mezbah Uddin Ashique Mohammad Md Doulotuzzaman Xames |
| author_facet | Kazi Arman Ahmed Israt Humaira Ashiqur Rahman Khan Md Shamim Hasan Mukitul Islam Anik Roy Mehrab Karim Mezbah Uddin Ashique Mohammad Md Doulotuzzaman Xames |
| author_sort | Kazi Arman Ahmed |
| collection | DOAJ |
| description | Breast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, and deep learning (DL) techniques, to enhance breast cancer diagnosis. The objective is to compare the efficiency and accuracy of these models using original and synthetic datasets, contributing to the advancement of breast cancer diagnosis. The methodology comprises three phases, each with two stages. In the first stage of each phase, stratified K-fold cross-validation was performed to train and evaluate multiple ML models. The second stage involved DL-based and AutoML-based ensemble strategies to improve prediction accuracy. In the second and third phases, synthetic data generation methods, such as Gaussian Copula and TVAE, were utilized. The KNN model outperformed others on the original dataset, while the AutoML approach using H2OXGBoost using synthetic data also showed high accuracy. These findings underscore the effectiveness of traditional ML models and AutoML in predicting breast cancer. Additionally, the study demonstrated the potential of synthetic data generation methods to improve prediction performance, aiding decision-making in the diagnosis and treatment of breast cancer. |
| format | Article |
| id | doaj-art-a7ed8a20a57548ff958a640025aea73d |
| institution | OA Journals |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-a7ed8a20a57548ff958a640025aea73d2025-08-20T02:10:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032622110.1371/journal.pone.0326221Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.Kazi Arman AhmedIsrat HumairaAshiqur Rahman KhanMd Shamim HasanMukitul IslamAnik RoyMehrab KarimMezbah UddinAshique MohammadMd Doulotuzzaman XamesBreast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, and deep learning (DL) techniques, to enhance breast cancer diagnosis. The objective is to compare the efficiency and accuracy of these models using original and synthetic datasets, contributing to the advancement of breast cancer diagnosis. The methodology comprises three phases, each with two stages. In the first stage of each phase, stratified K-fold cross-validation was performed to train and evaluate multiple ML models. The second stage involved DL-based and AutoML-based ensemble strategies to improve prediction accuracy. In the second and third phases, synthetic data generation methods, such as Gaussian Copula and TVAE, were utilized. The KNN model outperformed others on the original dataset, while the AutoML approach using H2OXGBoost using synthetic data also showed high accuracy. These findings underscore the effectiveness of traditional ML models and AutoML in predicting breast cancer. Additionally, the study demonstrated the potential of synthetic data generation methods to improve prediction performance, aiding decision-making in the diagnosis and treatment of breast cancer.https://doi.org/10.1371/journal.pone.0326221 |
| spellingShingle | Kazi Arman Ahmed Israt Humaira Ashiqur Rahman Khan Md Shamim Hasan Mukitul Islam Anik Roy Mehrab Karim Mezbah Uddin Ashique Mohammad Md Doulotuzzaman Xames Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets. PLoS ONE |
| title | Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets. |
| title_full | Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets. |
| title_fullStr | Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets. |
| title_full_unstemmed | Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets. |
| title_short | Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets. |
| title_sort | advancing breast cancer prediction comparative analysis of ml models and deep learning based multi model ensembles on original and synthetic datasets |
| url | https://doi.org/10.1371/journal.pone.0326221 |
| work_keys_str_mv | AT kaziarmanahmed advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT israthumaira advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT ashiqurrahmankhan advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT mdshamimhasan advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT mukitulislam advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT anikroy advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT mehrabkarim advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT mezbahuddin advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT ashiquemohammad advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets AT mddoulotuzzamanxames advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets |