Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME

ABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community...

Full description

Saved in:

Bibliographic Details
Main Authors:	Md. Manowarul Islam, Habibur Rahman Rifat, Md. Shamim Bin Shahid, Arnisha Akhter, Md Ashraf Uddin, Khandaker Mohammad Mohi Uddin
Format:	Article
Language:	English
Published:	Wiley 2025-01-01
Series:	Engineering Reports
Subjects:	diabetes prediction gridsearchCV machine learning SHAP LIME quantile transformer
Online Access:	https://doi.org/10.1002/eng2.13080
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832576626965086208
author	Md. Manowarul Islam Habibur Rahman Rifat Md. Shamim Bin Shahid Arnisha Akhter Md Ashraf Uddin Khandaker Mohammad Mohi Uddin
author_facet	Md. Manowarul Islam Habibur Rahman Rifat Md. Shamim Bin Shahid Arnisha Akhter Md Ashraf Uddin Khandaker Mohammad Mohi Uddin
author_sort	Md. Manowarul Islam
collection	DOAJ
description	ABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset.
format	Article
id	doaj-art-7b38567423b649cdbb214f8c691c57a8
institution	Kabale University
issn	2577-8196
language	English
publishDate	2025-01-01
publisher	Wiley
record_format	Article
series	Engineering Reports
spelling	doaj-art-7b38567423b649cdbb214f8c691c57a82025-01-31T00:22:49ZengWileyEngineering Reports2577-81962025-01-0171n/an/a10.1002/eng2.13080Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIMEMd. Manowarul Islam0Habibur Rahman Rifat1Md. Shamim Bin Shahid2Arnisha Akhter3Md Ashraf Uddin4Khandaker Mohammad Mohi Uddin5Department of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Southeast University Dhaka BangladeshABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset.https://doi.org/10.1002/eng2.13080diabetes predictiongridsearchCVmachine learningSHAPLIMEquantile transformer
spellingShingle	Md. Manowarul Islam Habibur Rahman Rifat Md. Shamim Bin Shahid Arnisha Akhter Md Ashraf Uddin Khandaker Mohammad Mohi Uddin Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME Engineering Reports diabetes prediction gridsearchCV machine learning SHAP LIME quantile transformer
title	Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_full	Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_fullStr	Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_full_unstemmed	Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_short	Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
title_sort	explainable machine learning for efficient diabetes prediction using hyperparameter tuning shap analysis partial dependency and lime
topic	diabetes prediction gridsearchCV machine learning SHAP LIME quantile transformer
url	https://doi.org/10.1002/eng2.13080
work_keys_str_mv	AT mdmanowarulislam explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT habiburrahmanrifat explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT mdshamimbinshahid explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT arnishaakhter explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT mdashrafuddin explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT khandakermohammadmohiuddin explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime

Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME

Similar Items