Efficient diagnosis of diabetes mellitus using an improved ensemble method
Abstract Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfittin...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-87767-1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585764692557824 |
---|---|
author | Blessing Oluwatobi Olorunfemi Adewale Opeoluwa Ogunde Ahmad Almogren Abidemi Emmanuel Adeniyi Sunday Adeola Ajagbe Salil Bharany Ayman Altameem Ateeq Ur Rehman Asif Mehmood Habib Hamam |
author_facet | Blessing Oluwatobi Olorunfemi Adewale Opeoluwa Ogunde Ahmad Almogren Abidemi Emmanuel Adeniyi Sunday Adeola Ajagbe Salil Bharany Ayman Altameem Ateeq Ur Rehman Asif Mehmood Habib Hamam |
author_sort | Blessing Oluwatobi Olorunfemi |
collection | DOAJ |
description | Abstract Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives. |
format | Article |
id | doaj-art-33dfbc5c58c1455c98671f1d44fa4445 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-33dfbc5c58c1455c98671f1d44fa44452025-01-26T12:29:45ZengNature PortfolioScientific Reports2045-23222025-01-0115112310.1038/s41598-025-87767-1Efficient diagnosis of diabetes mellitus using an improved ensemble methodBlessing Oluwatobi Olorunfemi0Adewale Opeoluwa Ogunde1Ahmad Almogren2Abidemi Emmanuel Adeniyi3Sunday Adeola Ajagbe4Salil Bharany5Ayman Altameem6Ateeq Ur Rehman7Asif Mehmood8Habib Hamam9Department of Computer Science, Faculty of Natural Sciences, Redeemer’s UniversityDepartment of Computer Science, Faculty of Natural Sciences, Redeemer’s UniversityDepartment of Computer Science, College of Computer and Information Sciences, King Saud UniversityCollege of Computing and Communication Studies, Bowen UniversityDepartment of Computer Science, University of ZululandChitkara University Institute of Engineering and Technology, Chitkara UniversityDepartment of Natural and Engineering Sciences, College of Applied Studies and Community Services, King Saud UniversitySchool of Computing, Gachon UniversityDepartment of Biomedical Engineering, College of IT Convergence, Gachon UniversityFaculty of Engineering, Université de MonctonAbstract Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives.https://doi.org/10.1038/s41598-025-87767-1ClassificationDiabetes MellitusGradient boostingRandom forestXG boost |
spellingShingle | Blessing Oluwatobi Olorunfemi Adewale Opeoluwa Ogunde Ahmad Almogren Abidemi Emmanuel Adeniyi Sunday Adeola Ajagbe Salil Bharany Ayman Altameem Ateeq Ur Rehman Asif Mehmood Habib Hamam Efficient diagnosis of diabetes mellitus using an improved ensemble method Scientific Reports Classification Diabetes Mellitus Gradient boosting Random forest XG boost |
title | Efficient diagnosis of diabetes mellitus using an improved ensemble method |
title_full | Efficient diagnosis of diabetes mellitus using an improved ensemble method |
title_fullStr | Efficient diagnosis of diabetes mellitus using an improved ensemble method |
title_full_unstemmed | Efficient diagnosis of diabetes mellitus using an improved ensemble method |
title_short | Efficient diagnosis of diabetes mellitus using an improved ensemble method |
title_sort | efficient diagnosis of diabetes mellitus using an improved ensemble method |
topic | Classification Diabetes Mellitus Gradient boosting Random forest XG boost |
url | https://doi.org/10.1038/s41598-025-87767-1 |
work_keys_str_mv | AT blessingoluwatobiolorunfemi efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT adewaleopeoluwaogunde efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT ahmadalmogren efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT abidemiemmanueladeniyi efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT sundayadeolaajagbe efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT salilbharany efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT aymanaltameem efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT ateequrrehman efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT asifmehmood efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod AT habibhamam efficientdiagnosisofdiabetesmellitususinganimprovedensemblemethod |