Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database

This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the signif...

Full description

Saved in:
Bibliographic Details
Main Author: Zhao Yize
Format: Article
Language:English
Published: EDP Sciences 2025-01-01
Series:ITM Web of Conferences
Online Access:https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability.
ISSN:2271-2097