Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
Diabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through ma...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-09-01
|
| Series: | Franklin Open |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2773186324000835 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850178296241717248 |
|---|---|
| author | Hauwau Abdulrahman Aliyu Ibrahim Olawale Muritala Habeeb Bello-Salau Salisu Mohammed Adeiza James Onumanyi Ore-Ofe Ajayi |
| author_facet | Hauwau Abdulrahman Aliyu Ibrahim Olawale Muritala Habeeb Bello-Salau Salisu Mohammed Adeiza James Onumanyi Ore-Ofe Ajayi |
| author_sort | Hauwau Abdulrahman Aliyu |
| collection | DOAJ |
| description | Diabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through machine learning relies on datasets, often plagued by imbalance, leading to biased classification and inaccurate diagnoses. Previous research attempts, employing techniques like random sampling (under-sampling and oversampling) and the Synthetic Minority Oversampling Technique (SMOTE), have struggled to achieve optimally balanced datasets. Additionally, setting the best parameters for machine learning classifiers remains a challenging task. To address these issues, this research focuses on devising a methodological metaheuristic optimization, a machine learning algorithm tailored for diabetes data balancing, and classifier hyperparameter tuning. Leveraging Particle Swarm Optimization (PSO) algorithm for diabetes data balancing and a genetic algorithm to select the optimal architecture for various machine learning classifiers. The study compares the performance of the proposed metaheuristic data balancer and classifier architecture parameter tuner using classification metrics (F1 score, Average Precision–Recall (APR), AUC, and accuracy). The PSO balanced dataset emerges as the most effective in classifying diabetes, with an Average Percentage Improvement (API) in classification performance metrics: 20.78% accuracy, 16.79% area under the curve for receiver operating characteristics, and a significant 32.78% enhancement in APR. Moreover, the XGBOOST classifier trained with a genetic algorithm demonstrates minimal computational training time for the Centre for Disease Control and Prevention (CDC) diabetes dataset compared to the artificial neural network and random forest classifier. Notably, the imbalanced CDC diabetes dataset exhibits the least APR compared to random under-sampling and the PSO data balancing technique. |
| format | Article |
| id | doaj-art-c14cf757dba6448fa9db2910406e0c0a |
| institution | OA Journals |
| issn | 2773-1863 |
| language | English |
| publishDate | 2024-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Franklin Open |
| spelling | doaj-art-c14cf757dba6448fa9db2910406e0c0a2025-08-20T02:18:47ZengElsevierFranklin Open2773-18632024-09-01810015310.1016/j.fraope.2024.100153Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parametersHauwau Abdulrahman Aliyu0Ibrahim Olawale Muritala1Habeeb Bello-Salau2Salisu Mohammed3Adeiza James Onumanyi4Ore-Ofe Ajayi5Department of Biochemistry and Molecular Biology, Federal University Birnin-Kebbi, 1157, Kebbi, NigeriaDepartment of Computer Engineering, Ahmadu Bello University, Zaria, 810107, Nigeria; Corresponding author.Department of Computer Engineering, Ahmadu Bello University, Zaria, 810107, NigeriaDepartment of Maintenance Engineering, KRPC Ltd, Kaduna Nigerian National Petroleum Company, Kaduna, 800242, NigeriaAIoT, Next Generation Enterprises and Institutions, Council for Scientific and Industrial Research (CSIR), Pretoria, 0001, South AfricaDepartment of Computer Engineering, Ahmadu Bello University, Zaria, 810107, NigeriaDiabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through machine learning relies on datasets, often plagued by imbalance, leading to biased classification and inaccurate diagnoses. Previous research attempts, employing techniques like random sampling (under-sampling and oversampling) and the Synthetic Minority Oversampling Technique (SMOTE), have struggled to achieve optimally balanced datasets. Additionally, setting the best parameters for machine learning classifiers remains a challenging task. To address these issues, this research focuses on devising a methodological metaheuristic optimization, a machine learning algorithm tailored for diabetes data balancing, and classifier hyperparameter tuning. Leveraging Particle Swarm Optimization (PSO) algorithm for diabetes data balancing and a genetic algorithm to select the optimal architecture for various machine learning classifiers. The study compares the performance of the proposed metaheuristic data balancer and classifier architecture parameter tuner using classification metrics (F1 score, Average Precision–Recall (APR), AUC, and accuracy). The PSO balanced dataset emerges as the most effective in classifying diabetes, with an Average Percentage Improvement (API) in classification performance metrics: 20.78% accuracy, 16.79% area under the curve for receiver operating characteristics, and a significant 32.78% enhancement in APR. Moreover, the XGBOOST classifier trained with a genetic algorithm demonstrates minimal computational training time for the Centre for Disease Control and Prevention (CDC) diabetes dataset compared to the artificial neural network and random forest classifier. Notably, the imbalanced CDC diabetes dataset exhibits the least APR compared to random under-sampling and the PSO data balancing technique.http://www.sciencedirect.com/science/article/pii/S2773186324000835BioinformaticsBiotechnologyComputational genomicsMachine learningMetaheuristic algorithm |
| spellingShingle | Hauwau Abdulrahman Aliyu Ibrahim Olawale Muritala Habeeb Bello-Salau Salisu Mohammed Adeiza James Onumanyi Ore-Ofe Ajayi Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters Franklin Open Bioinformatics Biotechnology Computational genomics Machine learning Metaheuristic algorithm |
| title | Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters |
| title_full | Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters |
| title_fullStr | Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters |
| title_full_unstemmed | Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters |
| title_short | Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters |
| title_sort | optimizing machine learning algorithms for diabetes data a metaheuristic approach to balancing and tuning classifiers parameters |
| topic | Bioinformatics Biotechnology Computational genomics Machine learning Metaheuristic algorithm |
| url | http://www.sciencedirect.com/science/article/pii/S2773186324000835 |
| work_keys_str_mv | AT hauwauabdulrahmanaliyu optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters AT ibrahimolawalemuritala optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters AT habeebbellosalau optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters AT salisumohammed optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters AT adeizajamesonumanyi optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters AT oreofeajayi optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters |