Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database

This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the signif...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhao Yize
Format:	Article
Language:	English
Published:	EDP Sciences 2025-01-01
Series:	ITM Web of Conferences
Online Access:	https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850243692774817792
author	Zhao Yize
author_facet	Zhao Yize
author_sort	Zhao Yize
collection	DOAJ
description	This study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability.
format	Article
id	doaj-art-5d04c233ee73456f89fc3dede4bc162f
institution	OA Journals
issn	2271-2097
language	English
publishDate	2025-01-01
publisher	EDP Sciences
record_format	Article
series	ITM Web of Conferences
spelling	doaj-art-5d04c233ee73456f89fc3dede4bc162f2025-08-20T01:59:56ZengEDP SciencesITM Web of Conferences2271-20972025-01-01700202110.1051/itmconf/20257002021itmconf_dai2024_02021Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes DatabaseZhao Yize0Department of Statistical Science, University College LondonThis study provides an in-depth review and comparison of diabetes prediction models using the Pima Indian Diabetes database. The main aim is to contrast and evaluate the performance of two distinct predictive models: K-means clustering and Random Forest. The research begins by introducing the significance of accurate diabetes prediction and the methodologies used in the analysis. The K-means model operates by grouping data points into separate clusters according to their characteristics, achieving an accuracy of 90.04% in diabetes prediction. In comparison, the random forest model, which builds multiple decision trees (DT) to do their predictions, demonstrates superior performance over several widely used algorithms such as K-Nearest Neighbours (KNN), Logistic Regression (LR), DT, Support Vector Machines (SVM), and Gradient Boosting (GB). The study reveals that while both models are effective, the Random Forest model provides enhanced predictive accuracy. These findings underscore these models’ potential for use in real-world medical diagnosis, where they can assist in identifying people at risk of diagnosing diabetes and starting early prevention. Future research directions include further refinement of these models and their application to larger and more diverse datasets to improve prediction accuracy and generalizability.https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
spellingShingle	Zhao Yize Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database ITM Web of Conferences
title	Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_full	Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_fullStr	Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_full_unstemmed	Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_short	Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database
title_sort	comparative analysis of diabetes prediction models using the pima indian diabetes database
url	https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02021.pdf
work_keys_str_mv	AT zhaoyize comparativeanalysisofdiabetespredictionmodelsusingthepimaindiandiabetesdatabase

Comparative Analysis of Diabetes Prediction Models Using the Pima Indian Diabetes Database

Similar Items