Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
Abstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthc...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2024-11-01
|
| Series: | International Journal of Computational Intelligence Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44196-024-00678-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850062253109280768 |
|---|---|
| author | G. R. Ashisha X. Anitha Mary E. Grace Mary Kanaga J. Andrew R. Jennifer Eunice |
| author_facet | G. R. Ashisha X. Anitha Mary E. Grace Mary Kanaga J. Andrew R. Jennifer Eunice |
| author_sort | G. R. Ashisha |
| collection | DOAJ |
| description | Abstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic. |
| format | Article |
| id | doaj-art-ec83f8106c1d45c898f8f8841c0abc28 |
| institution | DOAJ |
| issn | 1875-6883 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Springer |
| record_format | Article |
| series | International Journal of Computational Intelligence Systems |
| spelling | doaj-art-ec83f8106c1d45c898f8f8841c0abc282025-08-20T02:49:59ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-11-0117111710.1007/s44196-024-00678-3Random Oversampling-Based Diabetes Classification via Machine Learning AlgorithmsG. R. Ashisha0X. Anitha Mary1E. Grace Mary Kanaga2J. Andrew3R. Jennifer Eunice4Department of Electronics and Instrumentation Engineering, Karunya Institute of Technology and SciencesDepartment of Robotics Engineering, Karunya Institute of Technology and SciencesDepartment of Computer Science Engineering, Karunya Institute of Technology and SciencesDepartment of Computer Science Engineering, Manipal Institute of Technology, Manipal Academy of Higher EducationDepartment of Mechatronics Engineering, Manipal Institute of Technology, Manipal Academy of Higher EducationAbstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic.https://doi.org/10.1007/s44196-024-00678-3Boruta techniqueInterquartile rangeLight gradient boosting classifierRandom forestRandom oversampling |
| spellingShingle | G. R. Ashisha X. Anitha Mary E. Grace Mary Kanaga J. Andrew R. Jennifer Eunice Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms International Journal of Computational Intelligence Systems Boruta technique Interquartile range Light gradient boosting classifier Random forest Random oversampling |
| title | Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms |
| title_full | Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms |
| title_fullStr | Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms |
| title_full_unstemmed | Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms |
| title_short | Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms |
| title_sort | random oversampling based diabetes classification via machine learning algorithms |
| topic | Boruta technique Interquartile range Light gradient boosting classifier Random forest Random oversampling |
| url | https://doi.org/10.1007/s44196-024-00678-3 |
| work_keys_str_mv | AT grashisha randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms AT xanithamary randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms AT egracemarykanaga randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms AT jandrew randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms AT rjennifereunice randomoversamplingbaseddiabetesclassificationviamachinelearningalgorithms |