An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values

Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis p...

Full description

Saved in:
Bibliographic Details
Main Authors: Kumarmangal Roy, Muneer Ahmad, Kinza Waqar, Kirthanaah Priyaah, Jamel Nebhen, Sultan S Alshamrani, Muhammad Ahsan Raza, Ihsan Ali
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2021/9953314
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850235290764967936
author Kumarmangal Roy
Muneer Ahmad
Kinza Waqar
Kirthanaah Priyaah
Jamel Nebhen
Sultan S Alshamrani
Muhammad Ahsan Raza
Ihsan Ali
author_facet Kumarmangal Roy
Muneer Ahmad
Kinza Waqar
Kirthanaah Priyaah
Jamel Nebhen
Sultan S Alshamrani
Muhammad Ahsan Raza
Ihsan Ali
author_sort Kumarmangal Roy
collection DOAJ
description Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.
format Article
id doaj-art-0f754e16132143fe8a015c8d16caa28e
institution OA Journals
issn 1076-2787
1099-0526
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-0f754e16132143fe8a015c8d16caa28e2025-08-20T02:02:19ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/99533149953314An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing ValuesKumarmangal Roy0Muneer Ahmad1Kinza Waqar2Kirthanaah Priyaah3Jamel Nebhen4Sultan S Alshamrani5Muhammad Ahsan Raza6Ihsan Ali7Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaPrince Sattam Bin Abdulaziz University, College of Computer Engineering and Sciences, P.O. Box 151, Alkharj 11942, Saudi ArabiaDepartment of Information Technology, College of Computer and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi ArabiaDepartment of Information Technology, Bahauddin Zakariya University, Multan 60000, PakistanFaculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur 50603, MalaysiaDiabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.http://dx.doi.org/10.1155/2021/9953314
spellingShingle Kumarmangal Roy
Muneer Ahmad
Kinza Waqar
Kirthanaah Priyaah
Jamel Nebhen
Sultan S Alshamrani
Muhammad Ahsan Raza
Ihsan Ali
An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
Complexity
title An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
title_full An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
title_fullStr An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
title_full_unstemmed An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
title_short An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
title_sort enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values
url http://dx.doi.org/10.1155/2021/9953314
work_keys_str_mv AT kumarmangalroy anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT muneerahmad anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT kinzawaqar anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT kirthanaahpriyaah anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT jamelnebhen anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT sultansalshamrani anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT muhammadahsanraza anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT ihsanali anenhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT kumarmangalroy enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT muneerahmad enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT kinzawaqar enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT kirthanaahpriyaah enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT jamelnebhen enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT sultansalshamrani enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT muhammadahsanraza enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues
AT ihsanali enhancedmachinelearningframeworkfortype2diabetesclassificationusingimbalanceddatawithmissingvalues