Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters

Diabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through ma...

Full description

Saved in:
Bibliographic Details
Main Authors: Hauwau Abdulrahman Aliyu, Ibrahim Olawale Muritala, Habeeb Bello-Salau, Salisu Mohammed, Adeiza James Onumanyi, Ore-Ofe Ajayi
Format: Article
Language:English
Published: Elsevier 2024-09-01
Series:Franklin Open
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2773186324000835
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850178296241717248
author Hauwau Abdulrahman Aliyu
Ibrahim Olawale Muritala
Habeeb Bello-Salau
Salisu Mohammed
Adeiza James Onumanyi
Ore-Ofe Ajayi
author_facet Hauwau Abdulrahman Aliyu
Ibrahim Olawale Muritala
Habeeb Bello-Salau
Salisu Mohammed
Adeiza James Onumanyi
Ore-Ofe Ajayi
author_sort Hauwau Abdulrahman Aliyu
collection DOAJ
description Diabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through machine learning relies on datasets, often plagued by imbalance, leading to biased classification and inaccurate diagnoses. Previous research attempts, employing techniques like random sampling (under-sampling and oversampling) and the Synthetic Minority Oversampling Technique (SMOTE), have struggled to achieve optimally balanced datasets. Additionally, setting the best parameters for machine learning classifiers remains a challenging task. To address these issues, this research focuses on devising a methodological metaheuristic optimization, a machine learning algorithm tailored for diabetes data balancing, and classifier hyperparameter tuning. Leveraging Particle Swarm Optimization (PSO) algorithm for diabetes data balancing and a genetic algorithm to select the optimal architecture for various machine learning classifiers. The study compares the performance of the proposed metaheuristic data balancer and classifier architecture parameter tuner using classification metrics (F1 score, Average Precision–Recall (APR), AUC, and accuracy). The PSO balanced dataset emerges as the most effective in classifying diabetes, with an Average Percentage Improvement (API) in classification performance metrics: 20.78% accuracy, 16.79% area under the curve for receiver operating characteristics, and a significant 32.78% enhancement in APR. Moreover, the XGBOOST classifier trained with a genetic algorithm demonstrates minimal computational training time for the Centre for Disease Control and Prevention (CDC) diabetes dataset compared to the artificial neural network and random forest classifier. Notably, the imbalanced CDC diabetes dataset exhibits the least APR compared to random under-sampling and the PSO data balancing technique.
format Article
id doaj-art-c14cf757dba6448fa9db2910406e0c0a
institution OA Journals
issn 2773-1863
language English
publishDate 2024-09-01
publisher Elsevier
record_format Article
series Franklin Open
spelling doaj-art-c14cf757dba6448fa9db2910406e0c0a2025-08-20T02:18:47ZengElsevierFranklin Open2773-18632024-09-01810015310.1016/j.fraope.2024.100153Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parametersHauwau Abdulrahman Aliyu0Ibrahim Olawale Muritala1Habeeb Bello-Salau2Salisu Mohammed3Adeiza James Onumanyi4Ore-Ofe Ajayi5Department of Biochemistry and Molecular Biology, Federal University Birnin-Kebbi, 1157, Kebbi, NigeriaDepartment of Computer Engineering, Ahmadu Bello University, Zaria, 810107, Nigeria; Corresponding author.Department of Computer Engineering, Ahmadu Bello University, Zaria, 810107, NigeriaDepartment of Maintenance Engineering, KRPC Ltd, Kaduna Nigerian National Petroleum Company, Kaduna, 800242, NigeriaAIoT, Next Generation Enterprises and Institutions, Council for Scientific and Industrial Research (CSIR), Pretoria, 0001, South AfricaDepartment of Computer Engineering, Ahmadu Bello University, Zaria, 810107, NigeriaDiabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through machine learning relies on datasets, often plagued by imbalance, leading to biased classification and inaccurate diagnoses. Previous research attempts, employing techniques like random sampling (under-sampling and oversampling) and the Synthetic Minority Oversampling Technique (SMOTE), have struggled to achieve optimally balanced datasets. Additionally, setting the best parameters for machine learning classifiers remains a challenging task. To address these issues, this research focuses on devising a methodological metaheuristic optimization, a machine learning algorithm tailored for diabetes data balancing, and classifier hyperparameter tuning. Leveraging Particle Swarm Optimization (PSO) algorithm for diabetes data balancing and a genetic algorithm to select the optimal architecture for various machine learning classifiers. The study compares the performance of the proposed metaheuristic data balancer and classifier architecture parameter tuner using classification metrics (F1 score, Average Precision–Recall (APR), AUC, and accuracy). The PSO balanced dataset emerges as the most effective in classifying diabetes, with an Average Percentage Improvement (API) in classification performance metrics: 20.78% accuracy, 16.79% area under the curve for receiver operating characteristics, and a significant 32.78% enhancement in APR. Moreover, the XGBOOST classifier trained with a genetic algorithm demonstrates minimal computational training time for the Centre for Disease Control and Prevention (CDC) diabetes dataset compared to the artificial neural network and random forest classifier. Notably, the imbalanced CDC diabetes dataset exhibits the least APR compared to random under-sampling and the PSO data balancing technique.http://www.sciencedirect.com/science/article/pii/S2773186324000835BioinformaticsBiotechnologyComputational genomicsMachine learningMetaheuristic algorithm
spellingShingle Hauwau Abdulrahman Aliyu
Ibrahim Olawale Muritala
Habeeb Bello-Salau
Salisu Mohammed
Adeiza James Onumanyi
Ore-Ofe Ajayi
Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
Franklin Open
Bioinformatics
Biotechnology
Computational genomics
Machine learning
Metaheuristic algorithm
title Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
title_full Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
title_fullStr Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
title_full_unstemmed Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
title_short Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters
title_sort optimizing machine learning algorithms for diabetes data a metaheuristic approach to balancing and tuning classifiers parameters
topic Bioinformatics
Biotechnology
Computational genomics
Machine learning
Metaheuristic algorithm
url http://www.sciencedirect.com/science/article/pii/S2773186324000835
work_keys_str_mv AT hauwauabdulrahmanaliyu optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters
AT ibrahimolawalemuritala optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters
AT habeebbellosalau optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters
AT salisumohammed optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters
AT adeizajamesonumanyi optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters
AT oreofeajayi optimizingmachinelearningalgorithmsfordiabetesdataametaheuristicapproachtobalancingandtuningclassifiersparameters