MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS
This paper focused on predicting diabetes disease using machine learning models which is a very active and highly important area of research. Six machine learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Ne...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Institute of Mechanics of Continua and Mathematical Sciences
2025-01-01
|
| Series: | Journal of Mechanics of Continua and Mathematical Sciences |
| Subjects: | |
| Online Access: | https://jmcms.s3.amazonaws.com/wp-content/uploads/2025/01/20062401/jmcms-2501015-MACHINE-LEARNING-BASED-PREDICTION-MH-JH-1.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850264228524457984 |
|---|---|
| author | Mohammad Raquibul Hossain Md. Jamal Hossain Md. Mijanoor Rahman Mohammad Manjur Alam |
| author_facet | Mohammad Raquibul Hossain Md. Jamal Hossain Md. Mijanoor Rahman Mohammad Manjur Alam |
| author_sort | Mohammad Raquibul Hossain |
| collection | DOAJ |
| description | This paper focused on predicting diabetes disease using machine learning
models which is a very active and highly important area of research. Six machine
learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Nearest Neighbour, Gaussian Naïve Bayes, Decision Tree, Random Forest, and XGBoost. The datasets are Pima Indian, the Frankfurt Hospital dataset, and the combined dataset where all datasets have 08 (eight) feature variables and 01 (one) target variable. Train-test data split ratio can make a significant difference in model performance. Hence, two different split ratios 50-50 and 90-10 were experimented. Model performances were evaluated using four performance metrics which are precision, recall, F1-score, and accuracy. Random Forest and XGBoost were found to be highly efficient and best-performing among all the methods based on all performance metrics, all datasets, and both traintest split ratios. They performed comparatively better with the combined dataset which
involved 2768 instances indicating the importance of a large dataset for better results. Also, the 90-10 train-test split ratio produced comparatively improved results than the 50-50 split ratio for all the datasets and even for almost all models. |
| format | Article |
| id | doaj-art-14b5ca243b91447d83598c03cee07b2a |
| institution | OA Journals |
| issn | 0973-8975 2454-7190 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Institute of Mechanics of Continua and Mathematical Sciences |
| record_format | Article |
| series | Journal of Mechanics of Continua and Mathematical Sciences |
| spelling | doaj-art-14b5ca243b91447d83598c03cee07b2a2025-08-20T01:54:45ZengInstitute of Mechanics of Continua and Mathematical SciencesJournal of Mechanics of Continua and Mathematical Sciences0973-89752454-71902025-01-012019911410.26782/jmcms.2025.01.00007MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETSMohammad Raquibul Hossain0Md. Jamal Hossain1Md. Mijanoor Rahman2Mohammad Manjur Alam3Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.Department of Mathematics, Mawlana Bhashani Science and Technology University, Santosh, Tangail-1902, Bangladesh.Department of Computer Science and Engineering, International Islamic University Chittagong (IIUC), Chittagong-4318, Bangladesh.This paper focused on predicting diabetes disease using machine learning models which is a very active and highly important area of research. Six machine learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Nearest Neighbour, Gaussian Naïve Bayes, Decision Tree, Random Forest, and XGBoost. The datasets are Pima Indian, the Frankfurt Hospital dataset, and the combined dataset where all datasets have 08 (eight) feature variables and 01 (one) target variable. Train-test data split ratio can make a significant difference in model performance. Hence, two different split ratios 50-50 and 90-10 were experimented. Model performances were evaluated using four performance metrics which are precision, recall, F1-score, and accuracy. Random Forest and XGBoost were found to be highly efficient and best-performing among all the methods based on all performance metrics, all datasets, and both traintest split ratios. They performed comparatively better with the combined dataset which involved 2768 instances indicating the importance of a large dataset for better results. Also, the 90-10 train-test split ratio produced comparatively improved results than the 50-50 split ratio for all the datasets and even for almost all models. https://jmcms.s3.amazonaws.com/wp-content/uploads/2025/01/20062401/jmcms-2501015-MACHINE-LEARNING-BASED-PREDICTION-MH-JH-1.pdfmachine learning methodsdiabetes predictionlogistic regressionclassificationrandom forestxgboost. |
| spellingShingle | Mohammad Raquibul Hossain Md. Jamal Hossain Md. Mijanoor Rahman Mohammad Manjur Alam MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS Journal of Mechanics of Continua and Mathematical Sciences machine learning methods diabetes prediction logistic regression classification random forest xgboost. |
| title | MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS |
| title_full | MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS |
| title_fullStr | MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS |
| title_full_unstemmed | MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS |
| title_short | MACHINE LEARNING BASED PREDICTION AND INSIGHTS OF DIABETES DISEASE: PIMA INDIAN AND FRANKFURT DATASETS |
| title_sort | machine learning based prediction and insights of diabetes disease pima indian and frankfurt datasets |
| topic | machine learning methods diabetes prediction logistic regression classification random forest xgboost. |
| url | https://jmcms.s3.amazonaws.com/wp-content/uploads/2025/01/20062401/jmcms-2501015-MACHINE-LEARNING-BASED-PREDICTION-MH-JH-1.pdf |
| work_keys_str_mv | AT mohammadraquibulhossain machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets AT mdjamalhossain machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets AT mdmijanoorrahman machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets AT mohammadmanjuralam machinelearningbasedpredictionandinsightsofdiabetesdiseasepimaindianandfrankfurtdatasets |