Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms

Abstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthc...

Full description

Saved in:
Bibliographic Details
Main Authors: G. R. Ashisha, X. Anitha Mary, E. Grace Mary Kanaga, J. Andrew, R. Jennifer Eunice
Format: Article
Language:English
Published: Springer 2024-11-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://doi.org/10.1007/s44196-024-00678-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850062253109280768
author G. R. Ashisha
X. Anitha Mary
E. Grace Mary Kanaga
J. Andrew
R. Jennifer Eunice
author_facet G. R. Ashisha
X. Anitha Mary
E. Grace Mary Kanaga
J. Andrew
R. Jennifer Eunice
author_sort G. R. Ashisha
collection DOAJ
description Abstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic.
format Article
id doaj-art-ec83f8106c1d45c898f8f8841c0abc28
institution DOAJ
issn 1875-6883
language English
publishDate 2024-11-01
publisher Springer
record_format Article
series International Journal of Computational Intelligence Systems
spelling doaj-art-ec83f8106c1d45c898f8f8841c0abc282025-08-20T02:49:59ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-11-0117111710.1007/s44196-024-00678-3Random Oversampling-Based Diabetes Classification via Machine Learning AlgorithmsG. R. Ashisha0X. Anitha Mary1E. Grace Mary Kanaga2J. Andrew3R. Jennifer Eunice4Department of Electronics and Instrumentation Engineering, Karunya Institute of Technology and SciencesDepartment of Robotics Engineering, Karunya Institute of Technology and SciencesDepartment of Computer Science Engineering, Karunya Institute of Technology and SciencesDepartment of Computer Science Engineering, Manipal Institute of Technology, Manipal Academy of Higher EducationDepartment of Mechatronics Engineering, Manipal Institute of Technology, Manipal Academy of Higher EducationAbstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic.https://doi.org/10.1007/s44196-024-00678-3Boruta techniqueInterquartile rangeLight gradient boosting classifierRandom forestRandom oversampling
spellingShingle G. R. Ashisha
X. Anitha Mary
E. Grace Mary Kanaga
J. Andrew
R. Jennifer Eunice
Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
International Journal of Computational Intelligence Systems
Boruta technique
Interquartile range
Light gradient boosting classifier
Random forest
Random oversampling
title Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
title_full Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
title_fullStr Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
title_full_unstemmed Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
title_short Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
title_sort random oversampling based diabetes classification via machine learning algorithms
topic Boruta technique
Interquartile range
Light gradient boosting classifier
Random forest
Random oversampling
url https://doi.org/10.1007/s44196-024-00678-3
work_keys_str_mv AT grashisha randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms
AT xanithamary randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms
AT egracemarykanaga randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms
AT jandrew randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms
AT rjennifereunice randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms