Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME

ABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community...

Full description

Saved in:
Bibliographic Details
Main Authors: Md. Manowarul Islam, Habibur Rahman Rifat, Md. Shamim Bin Shahid, Arnisha Akhter, Md Ashraf Uddin, Khandaker Mohammad Mohi Uddin
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Engineering Reports
Subjects:
Online Access:https://doi.org/10.1002/eng2.13080
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576626965086208
author Md. Manowarul Islam
Habibur Rahman Rifat
Md. Shamim Bin Shahid
Arnisha Akhter
Md Ashraf Uddin
Khandaker Mohammad Mohi Uddin
author_facet Md. Manowarul Islam
Habibur Rahman Rifat
Md. Shamim Bin Shahid
Arnisha Akhter
Md Ashraf Uddin
Khandaker Mohammad Mohi Uddin
author_sort Md. Manowarul Islam
collection DOAJ
description ABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset.
format Article
id doaj-art-7b38567423b649cdbb214f8c691c57a8
institution Kabale University
issn 2577-8196
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Engineering Reports
spelling doaj-art-7b38567423b649cdbb214f8c691c57a82025-01-31T00:22:49ZengWileyEngineering Reports2577-81962025-01-0171n/an/a10.1002/eng2.13080Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIMEMd. Manowarul Islam0Habibur Rahman Rifat1Md. Shamim Bin Shahid2Arnisha Akhter3Md Ashraf Uddin4Khandaker Mohammad Mohi Uddin5Department of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Southeast University Dhaka BangladeshABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset.https://doi.org/10.1002/eng2.13080diabetes predictiongridsearchCVmachine learningSHAPLIMEquantile transformer
spellingShingle Md. Manowarul Islam
Habibur Rahman Rifat
Md. Shamim Bin Shahid
Arnisha Akhter
Md Ashraf Uddin
Khandaker Mohammad Mohi Uddin
Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
Engineering Reports
diabetes prediction
gridsearchCV
machine learning
SHAP
LIME
quantile transformer
title Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_full Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_fullStr Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_full_unstemmed Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_short Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_sort explainable machine learning for efficient diabetes prediction using hyperparameter tuning shap analysis partial dependency and lime
topic diabetes prediction
gridsearchCV
machine learning
SHAP
LIME
quantile transformer
url https://doi.org/10.1002/eng2.13080
work_keys_str_mv AT mdmanowarulislam explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime
AT habiburrahmanrifat explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime
AT mdshamimbinshahid explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime
AT arnishaakhter explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime
AT mdashrafuddin explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime
AT khandakermohammadmohiuddin explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime