An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis p...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2021-01-01
|
| Series: | Complexity |
| Online Access: | http://dx.doi.org/10.1155/2021/9953314 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850235290764967936 |
|---|---|
| author | Kumarmangal Roy Muneer Ahmad Kinza Waqar Kirthanaah Priyaah Jamel Nebhen Sultan S Alshamrani Muhammad Ahsan Raza Ihsan Ali |
| author_facet | Kumarmangal Roy Muneer Ahmad Kinza Waqar Kirthanaah Priyaah Jamel Nebhen Sultan S Alshamrani Muhammad Ahsan Raza Ihsan Ali |
| author_sort | Kumarmangal Roy |
| collection | DOAJ |
| description | Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further. |
| format | Article |
| id | doaj-art-0f754e16132143fe8a015c8d16caa28e |
| institution | OA Journals |
| issn | 1076-2787 1099-0526 |
| language | English |
| publishDate | 2021-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Complexity |
| spelling | doaj-art-0f754e16132143fe8a015c8d16caa28e2025-08-20T02:02:19ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/99533149953314An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing ValuesKumarmangal Roy0Muneer Ahmad1Kinza Waqar2Kirthanaah Priyaah3Jamel Nebhen4Sultan S Alshamrani5Muhammad Ahsan Raza6Ihsan Ali7Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaPrince Sattam Bin Abdulaziz University, College of Computer Engineering and Sciences, P.O. Box 151, Alkharj 11942, Saudi ArabiaDepartment of Information Technology, College of Computer and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi ArabiaDepartment of Information Technology, Bahauddin Zakariya University, Multan 60000, PakistanFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaDiabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.http://dx.doi.org/10.1155/2021/9953314 |
| spellingShingle | Kumarmangal Roy Muneer Ahmad Kinza Waqar Kirthanaah Priyaah Jamel Nebhen Sultan S Alshamrani Muhammad Ahsan Raza Ihsan Ali An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values Complexity |
| title | An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values |
| title_full | An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values |
| title_fullStr | An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values |
| title_full_unstemmed | An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values |
| title_short | An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values |
| title_sort | enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values |
| url | http://dx.doi.org/10.1155/2021/9953314 |
| work_keys_str_mv | AT kumarmangalroy anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT muneerahmad anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT kinzawaqar anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT kirthanaahpriyaah anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT jamelnebhen anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT sultansalshamrani anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT muhammadahsanraza anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT ihsanali anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT kumarmangalroy enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT muneerahmad enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT kinzawaqar enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT kirthanaahpriyaah enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT jamelnebhen enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT sultansalshamrani enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT muhammadahsanraza enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues AT ihsanali enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues |