MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS

This paper focused on predicting diabetes disease using machine learning models which is a very active and highly important area of research. Six machine learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Ne...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Raquibul Hossain, Md. Jamal Hossain, Md. Mijanoor Rahman, Mohammad Manjur Alam
Format: Article
Language:English
Published: Institute of Mechanics of Continua and Mathematical Sciences 2025-01-01
Series:Journal of Mechanics of Continua and Mathematical Sciences
Subjects:
Online Access:https://jmcms.s3.amazonaws.com/wp-content/uploads/2025/01/20062401/jmcms-2501015-MACHINE-LEARNING-BASED-PREDICTION-MH-JH-1.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850264228524457984
author Mohammad Raquibul Hossain
Md. Jamal Hossain
Md. Mijanoor Rahman
Mohammad Manjur Alam
author_facet Mohammad Raquibul Hossain
Md. Jamal Hossain
Md. Mijanoor Rahman
Mohammad Manjur Alam
author_sort Mohammad Raquibul Hossain
collection DOAJ
description This paper focused on predicting diabetes disease using machine learning models which is a very active and highly important area of research. Six machine learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Nearest Neighbour, Gaussian Naïve Bayes, Decision Tree, Random Forest, and XGBoost. The datasets are Pima Indian, the Frankfurt Hospital dataset, and the combined dataset where all datasets have 08 (eight) feature variables and 01 (one) target variable. Train-test data split ratio can make a significant difference in model performance. Hence, two different split ratios 50-50 and 90-10 were experimented. Model performances were evaluated using four performance metrics which are precision, recall, F1-score, and accuracy. Random Forest and XGBoost were found to be highly efficient and best-performing among all the methods based on all performance metrics, all datasets, and both traintest split ratios. They performed comparatively better with the combined dataset which involved 2768 instances indicating the importance of a large dataset for better results. Also, the 90-10 train-test split ratio produced comparatively improved results than the 50-50 split ratio for all the datasets and even for almost all models.
format Article
id doaj-art-14b5ca243b91447d83598c03cee07b2a
institution OA Journals
issn 0973-8975
2454-7190
language English
publishDate 2025-01-01
publisher Institute of Mechanics of Continua and Mathematical Sciences
record_format Article
series Journal of Mechanics of Continua and Mathematical Sciences
spelling doaj-art-14b5ca243b91447d83598c03cee07b2a2025-08-20T01:54:45ZengInstitute of Mechanics of Continua and Mathematical SciencesJournal of Mechanics of Continua and Mathematical Sciences0973-89752454-71902025-01-012019911410.26782/jmcms.2025.01.00007MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETSMohammad Raquibul Hossain0Md. Jamal Hossain1Md. Mijanoor Rahman2Mohammad Manjur Alam3Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.Department of Mathematics, Mawlana Bhashani Science and Technology University, Santosh, Tangail-1902, Bangladesh.Department of Computer Science and Engineering, International Islamic University Chittagong (IIUC), Chittagong-4318, Bangladesh.This paper focused on predicting diabetes disease using machine learning models which is a very active and highly important area of research. Six machine learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Nearest Neighbour, Gaussian Naïve Bayes, Decision Tree, Random Forest, and XGBoost. The datasets are Pima Indian, the Frankfurt Hospital dataset, and the combined dataset where all datasets have 08 (eight) feature variables and 01 (one) target variable. Train-test data split ratio can make a significant difference in model performance. Hence, two different split ratios 50-50 and 90-10 were experimented. Model performances were evaluated using four performance metrics which are precision, recall, F1-score, and accuracy. Random Forest and XGBoost were found to be highly efficient and best-performing among all the methods based on all performance metrics, all datasets, and both traintest split ratios. They performed comparatively better with the combined dataset which involved 2768 instances indicating the importance of a large dataset for better results. Also, the 90-10 train-test split ratio produced comparatively improved results than the 50-50 split ratio for all the datasets and even for almost all models. https://jmcms.s3.amazonaws.com/wp-content/uploads/2025/01/20062401/jmcms-2501015-MACHINE-LEARNING-BASED-PREDICTION-MH-JH-1.pdfmachine learning methodsdiabetes predictionlogistic regressionclassificationrandom forestxgboost.
spellingShingle Mohammad Raquibul Hossain
Md. Jamal Hossain
Md. Mijanoor Rahman
Mohammad Manjur Alam
MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
Journal of Mechanics of Continua and Mathematical Sciences
machine learning methods
diabetes prediction
logistic regression
classification
random forest
xgboost.
title MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
title_full MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
title_fullStr MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
title_full_unstemmed MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
title_short MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
title_sort machine learning based prediction and insights of diabetes disease pima indian and frankfurt datasets
topic machine learning methods
diabetes prediction
logistic regression
classification
random forest
xgboost.
url https://jmcms.s3.amazonaws.com/wp-content/uploads/2025/01/20062401/jmcms-2501015-MACHINE-LEARNING-BASED-PREDICTION-MH-JH-1.pdf
work_keys_str_mv AT mohammadraquibulhossain machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets
AT mdjamalhossain machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets
AT mdmijanoorrahman machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets
AT mohammadmanjuralam machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets