Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.

Breast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Kazi Arman Ahmed, Israt Humaira, Ashiqur Rahman Khan, Md Shamim Hasan, Mukitul Islam, Anik Roy, Mehrab Karim, Mezbah Uddin, Ashique Mohammad, Md Doulotuzzaman Xames
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0326221
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207972219682816
author Kazi Arman Ahmed
Israt Humaira
Ashiqur Rahman Khan
Md Shamim Hasan
Mukitul Islam
Anik Roy
Mehrab Karim
Mezbah Uddin
Ashique Mohammad
Md Doulotuzzaman Xames
author_facet Kazi Arman Ahmed
Israt Humaira
Ashiqur Rahman Khan
Md Shamim Hasan
Mukitul Islam
Anik Roy
Mehrab Karim
Mezbah Uddin
Ashique Mohammad
Md Doulotuzzaman Xames
author_sort Kazi Arman Ahmed
collection DOAJ
description Breast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, and deep learning (DL) techniques, to enhance breast cancer diagnosis. The objective is to compare the efficiency and accuracy of these models using original and synthetic datasets, contributing to the advancement of breast cancer diagnosis. The methodology comprises three phases, each with two stages. In the first stage of each phase, stratified K-fold cross-validation was performed to train and evaluate multiple ML models. The second stage involved DL-based and AutoML-based ensemble strategies to improve prediction accuracy. In the second and third phases, synthetic data generation methods, such as Gaussian Copula and TVAE, were utilized. The KNN model outperformed others on the original dataset, while the AutoML approach using H2OXGBoost using synthetic data also showed high accuracy. These findings underscore the effectiveness of traditional ML models and AutoML in predicting breast cancer. Additionally, the study demonstrated the potential of synthetic data generation methods to improve prediction performance, aiding decision-making in the diagnosis and treatment of breast cancer.
format Article
id doaj-art-a7ed8a20a57548ff958a640025aea73d
institution OA Journals
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-a7ed8a20a57548ff958a640025aea73d2025-08-20T02:10:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032622110.1371/journal.pone.0326221Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.Kazi Arman AhmedIsrat HumairaAshiqur Rahman KhanMd Shamim HasanMukitul IslamAnik RoyMehrab KarimMezbah UddinAshique MohammadMd Doulotuzzaman XamesBreast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, and deep learning (DL) techniques, to enhance breast cancer diagnosis. The objective is to compare the efficiency and accuracy of these models using original and synthetic datasets, contributing to the advancement of breast cancer diagnosis. The methodology comprises three phases, each with two stages. In the first stage of each phase, stratified K-fold cross-validation was performed to train and evaluate multiple ML models. The second stage involved DL-based and AutoML-based ensemble strategies to improve prediction accuracy. In the second and third phases, synthetic data generation methods, such as Gaussian Copula and TVAE, were utilized. The KNN model outperformed others on the original dataset, while the AutoML approach using H2OXGBoost using synthetic data also showed high accuracy. These findings underscore the effectiveness of traditional ML models and AutoML in predicting breast cancer. Additionally, the study demonstrated the potential of synthetic data generation methods to improve prediction performance, aiding decision-making in the diagnosis and treatment of breast cancer.https://doi.org/10.1371/journal.pone.0326221
spellingShingle Kazi Arman Ahmed
Israt Humaira
Ashiqur Rahman Khan
Md Shamim Hasan
Mukitul Islam
Anik Roy
Mehrab Karim
Mezbah Uddin
Ashique Mohammad
Md Doulotuzzaman Xames
Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
PLoS ONE
title Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
title_full Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
title_fullStr Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
title_full_unstemmed Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
title_short Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.
title_sort advancing breast cancer prediction comparative analysis of ml models and deep learning based multi model ensembles on original and synthetic datasets
url https://doi.org/10.1371/journal.pone.0326221
work_keys_str_mv AT kaziarmanahmed advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT israthumaira advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT ashiqurrahmankhan advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT mdshamimhasan advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT mukitulislam advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT anikroy advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT mehrabkarim advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT mezbahuddin advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT ashiquemohammad advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets
AT mddoulotuzzamanxames advancingbreastcancerpredictioncomparativeanalysisofmlmodelsanddeeplearningbasedmultimodelensemblesonoriginalandsyntheticdatasets