Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database

This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the signif...

Full description

Saved in:
Bibliographic Details
Main Author: Zhao Yize
Format: Article
Language:English
Published: EDP Sciences 2025-01-01
Series:ITM Web of Conferences
Online Access:https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206563090989056
author Zhao Yize
author_facet Zhao Yize
author_sort Zhao Yize
collection DOAJ
description This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability.
format Article
id doaj-art-5d04c233ee73456f89fc3dede4bc162f
institution Kabale University
issn 2271-2097
language English
publishDate 2025-01-01
publisher EDP Sciences
record_format Article
series ITM Web of Conferences
spelling doaj-art-5d04c233ee73456f89fc3dede4bc162f2025-02-07T08:21:11ZengEDP SciencesITM Web of Conferences2271-20972025-01-01700202110.1051/itmconf/20257002021itmconf_dai2024_02021Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes DatabaseZhao Yize0Department of Statistical Science, University College LondonThis study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability.https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
spellingShingle Zhao Yize
Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
ITM Web of Conferences
title Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_full Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_fullStr Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_full_unstemmed Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_short Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_sort comparative analysis of diabetes prediction models using the pima indian diabetes database
url https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
work_keys_str_mv AT zhaoyize comparativeanalysisofdiabetespredictionmodelsusingthepimaindiandiabetesdatabase