Performance of Machine Learning Classifiers for Diabetes Prediction

In this study, machine learning (ML) classifiers were evaluated for their effectiveness in predicting diabetes using the Pima Indians Diabetes Database. The dataset included 768 instances with nine attributes, where the target variable indicated whether a patient tested positive for diabetes. The cl...

Full description

Saved in:
Bibliographic Details
Main Authors: Mijala Manandhar, Shaikat Baidya, Babalpreet Kaur, Katia Atoji
Format: Article
Language:English
Published: IJMADA 2024-08-01
Series:International Journal of Management and Data Analytics
Subjects:
Online Access:https://ijmada.com/index.php/ijmada/article/view/39
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832593407251316736
author Mijala Manandhar
Shaikat Baidya
Babalpreet Kaur
Katia Atoji
author_facet Mijala Manandhar
Shaikat Baidya
Babalpreet Kaur
Katia Atoji
author_sort Mijala Manandhar
collection DOAJ
description In this study, machine learning (ML) classifiers were evaluated for their effectiveness in predicting diabetes using the Pima Indians Diabetes Database. The dataset included 768 instances with nine attributes, where the target variable indicated whether a patient tested positive for diabetes. The classifiers were grouped into Function (Logistic Regression, Multilayer Perceptron, Stochastic Gradient Descent), Rules (Decision Table, JRip, OneR), and Trees (Decision Stump, Hoeffding Tree, J48). Performance metrics such as accuracy, precision, recall, Matthews Correlation Coefficient, ROC Area, and F1-measure were used to compare the classifiers. Among the Function classifiers, Stochastic Gradient Descent (SGD) demonstrated the highest performance, particularly in handling large datasets and minimizing overfitting. Logistic Regression and Multilayer Perceptron also showed robust results, but SGD was superior in most metrics. For the Rules classifiers, JRip outperformed others due to its iterative rule optimization, whereas OneR's simplicity resulted in the lowest performance. Decision Table offered a clear representation of decision rules but was limited by the complexity of the dataset. In the Trees group, J48 was the most effective, benefitting from its ability to handle complex interactions and numerous features. The study highlights the potential of ML algorithms in early diabetes detection, enabling timely intervention and personalized management strategies. The importance of key predictors such as plasma glucose, BMI, and age was emphasized. Future research should focus on integrating multiple datasets and exploring more complex ML algorithms to enhance prediction accuracy and generalization. The development of real-time predictive systems is crucial for improving clinical processes and patient outcomes.
format Article
id doaj-art-46c9c512579e4224825cdd25d03840cd
institution Kabale University
issn 2816-9395
language English
publishDate 2024-08-01
publisher IJMADA
record_format Article
series International Journal of Management and Data Analytics
spelling doaj-art-46c9c512579e4224825cdd25d03840cd2025-01-20T15:45:31ZengIJMADAInternational Journal of Management and Data Analytics2816-93952024-08-01411839Performance of Machine Learning Classifiers for Diabetes PredictionMijala Manandhar0Shaikat Baidya1Babalpreet Kaur2Katia Atoji3University Canada WestUniversity Canada WestUniversity Canada WestUniversity Canada WestIn this study, machine learning (ML) classifiers were evaluated for their effectiveness in predicting diabetes using the Pima Indians Diabetes Database. The dataset included 768 instances with nine attributes, where the target variable indicated whether a patient tested positive for diabetes. The classifiers were grouped into Function (Logistic Regression, Multilayer Perceptron, Stochastic Gradient Descent), Rules (Decision Table, JRip, OneR), and Trees (Decision Stump, Hoeffding Tree, J48). Performance metrics such as accuracy, precision, recall, Matthews Correlation Coefficient, ROC Area, and F1-measure were used to compare the classifiers. Among the Function classifiers, Stochastic Gradient Descent (SGD) demonstrated the highest performance, particularly in handling large datasets and minimizing overfitting. Logistic Regression and Multilayer Perceptron also showed robust results, but SGD was superior in most metrics. For the Rules classifiers, JRip outperformed others due to its iterative rule optimization, whereas OneR's simplicity resulted in the lowest performance. Decision Table offered a clear representation of decision rules but was limited by the complexity of the dataset. In the Trees group, J48 was the most effective, benefitting from its ability to handle complex interactions and numerous features. The study highlights the potential of ML algorithms in early diabetes detection, enabling timely intervention and personalized management strategies. The importance of key predictors such as plasma glucose, BMI, and age was emphasized. Future research should focus on integrating multiple datasets and exploring more complex ML algorithms to enhance prediction accuracy and generalization. The development of real-time predictive systems is crucial for improving clinical processes and patient outcomes.https://ijmada.com/index.php/ijmada/article/view/39healthcareearly diagnosticsinsulinglucosepatient care
spellingShingle Mijala Manandhar
Shaikat Baidya
Babalpreet Kaur
Katia Atoji
Performance of Machine Learning Classifiers for Diabetes Prediction
International Journal of Management and Data Analytics
healthcare
early diagnostics
insulin
glucose
patient care
title Performance of Machine Learning Classifiers for Diabetes Prediction
title_full Performance of Machine Learning Classifiers for Diabetes Prediction
title_fullStr Performance of Machine Learning Classifiers for Diabetes Prediction
title_full_unstemmed Performance of Machine Learning Classifiers for Diabetes Prediction
title_short Performance of Machine Learning Classifiers for Diabetes Prediction
title_sort performance of machine learning classifiers for diabetes prediction
topic healthcare
early diagnostics
insulin
glucose
patient care
url https://ijmada.com/index.php/ijmada/article/view/39
work_keys_str_mv AT mijalamanandhar performanceofmachinelearningclassifiersfordiabetesprediction
AT shaikatbaidya performanceofmachinelearningclassifiersfordiabetesprediction
AT babalpreetkaur performanceofmachinelearningclassifiersfordiabetesprediction
AT katiaatoji performanceofmachinelearningclassifiersfordiabetesprediction