Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the signif...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2025-01-01
|
Series: | ITM Web of Conferences |
Online Access: | https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825206563090989056 |
---|---|
author | Zhao Yize |
author_facet | Zhao Yize |
author_sort | Zhao Yize |
collection | DOAJ |
description | This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability. |
format | Article |
id | doaj-art-5d04c233ee73456f89fc3dede4bc162f |
institution | Kabale University |
issn | 2271-2097 |
language | English |
publishDate | 2025-01-01 |
publisher | EDP Sciences |
record_format | Article |
series | ITM Web of Conferences |
spelling | doaj-art-5d04c233ee73456f89fc3dede4bc162f2025-02-07T08:21:11ZengEDP SciencesITM Web of Conferences2271-20972025-01-01700202110.1051/itmconf/20257002021itmconf_dai2024_02021Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes DatabaseZhao Yize0Department of Statistical Science, University College LondonThis study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability.https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf |
spellingShingle | Zhao Yize Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database ITM Web of Conferences |
title | Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database |
title_full | Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database |
title_fullStr | Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database |
title_full_unstemmed | Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database |
title_short | Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database |
title_sort | comparative analysis of diabetes prediction models using the pima indian diabetes database |
url | https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf |
work_keys_str_mv | AT zhaoyize comparativeanalysisofdiabetespredictionmodelsusingthepimaindiandiabetesdatabase |