Efficient diagnosis of diabetes mellitus using an improved ensemble method

Abstract Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfittin...

Full description

Saved in:
Bibliographic Details
Main Authors: Blessing Oluwatobi Olorunfemi, Adewale Opeoluwa Ogunde, Ahmad Almogren, Abidemi Emmanuel Adeniyi, Sunday Adeola Ajagbe, Salil Bharany, Ayman Altameem, Ateeq Ur Rehman, Asif Mehmood, Habib Hamam
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-87767-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585764692557824
author Blessing Oluwatobi Olorunfemi
Adewale Opeoluwa Ogunde
Ahmad Almogren
Abidemi Emmanuel Adeniyi
Sunday Adeola Ajagbe
Salil Bharany
Ayman Altameem
Ateeq Ur Rehman
Asif Mehmood
Habib Hamam
author_facet Blessing Oluwatobi Olorunfemi
Adewale Opeoluwa Ogunde
Ahmad Almogren
Abidemi Emmanuel Adeniyi
Sunday Adeola Ajagbe
Salil Bharany
Ayman Altameem
Ateeq Ur Rehman
Asif Mehmood
Habib Hamam
author_sort Blessing Oluwatobi Olorunfemi
collection DOAJ
description Abstract Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives.
format Article
id doaj-art-33dfbc5c58c1455c98671f1d44fa4445
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-33dfbc5c58c1455c98671f1d44fa44452025-01-26T12:29:45ZengNature PortfolioScientific Reports2045-23222025-01-0115112310.1038/s41598-025-87767-1Efficient diagnosis of diabetes mellitus using an improved ensemble methodBlessing Oluwatobi Olorunfemi0Adewale Opeoluwa Ogunde1Ahmad Almogren2Abidemi Emmanuel Adeniyi3Sunday Adeola Ajagbe4Salil Bharany5Ayman Altameem6Ateeq Ur Rehman7Asif Mehmood8Habib Hamam9Department of Computer Science, Faculty of Natural Sciences, Redeemer’s UniversityDepartment of Computer Science, Faculty of Natural Sciences, Redeemer’s UniversityDepartment of Computer Science, College of Computer and Information Sciences, King Saud UniversityCollege of Computing and Communication Studies, Bowen UniversityDepartment of Computer Science, University of ZululandChitkara University Institute of Engineering and Technology, Chitkara UniversityDepartment of Natural and Engineering Sciences, College of Applied Studies and Community Services, King Saud UniversitySchool of Computing, Gachon UniversityDepartment of Biomedical Engineering, College of IT Convergence, Gachon UniversityFaculty of Engineering, Université de MonctonAbstract Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives.https://doi.org/10.1038/s41598-025-87767-1ClassificationDiabetes MellitusGradient boostingRandom forestXG boost
spellingShingle Blessing Oluwatobi Olorunfemi
Adewale Opeoluwa Ogunde
Ahmad Almogren
Abidemi Emmanuel Adeniyi
Sunday Adeola Ajagbe
Salil Bharany
Ayman Altameem
Ateeq Ur Rehman
Asif Mehmood
Habib Hamam
Efficient diagnosis of diabetes mellitus using an improved ensemble method
Scientific Reports
Classification
Diabetes Mellitus
Gradient boosting
Random forest
XG boost
title Efficient diagnosis of diabetes mellitus using an improved ensemble method
title_full Efficient diagnosis of diabetes mellitus using an improved ensemble method
title_fullStr Efficient diagnosis of diabetes mellitus using an improved ensemble method
title_full_unstemmed Efficient diagnosis of diabetes mellitus using an improved ensemble method
title_short Efficient diagnosis of diabetes mellitus using an improved ensemble method
title_sort efficient diagnosis of diabetes mellitus using an improved ensemble method
topic Classification
Diabetes Mellitus
Gradient boosting
Random forest
XG boost
url https://doi.org/10.1038/s41598-025-87767-1
work_keys_str_mv AT blessingoluwatobiolorunfemi efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT adewaleopeoluwaogunde efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT ahmadalmogren efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT abidemiemmanueladeniyi efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT sundayadeolaajagbe efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT salilbharany efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT aymanaltameem efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT ateequrrehman efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT asifmehmood efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod
AT habibhamam efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod