Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME
ABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2025-01-01
|
Series: | Engineering Reports |
Subjects: | |
Online Access: | https://doi.org/10.1002/eng2.13080 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832576626965086208 |
---|---|
author | Md. Manowarul Islam Habibur Rahman Rifat Md. Shamim Bin Shahid Arnisha Akhter Md Ashraf Uddin Khandaker Mohammad Mohi Uddin |
author_facet | Md. Manowarul Islam Habibur Rahman Rifat Md. Shamim Bin Shahid Arnisha Akhter Md Ashraf Uddin Khandaker Mohammad Mohi Uddin |
author_sort | Md. Manowarul Islam |
collection | DOAJ |
description | ABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset. |
format | Article |
id | doaj-art-7b38567423b649cdbb214f8c691c57a8 |
institution | Kabale University |
issn | 2577-8196 |
language | English |
publishDate | 2025-01-01 |
publisher | Wiley |
record_format | Article |
series | Engineering Reports |
spelling | doaj-art-7b38567423b649cdbb214f8c691c57a82025-01-31T00:22:49ZengWileyEngineering Reports2577-81962025-01-0171n/an/a10.1002/eng2.13080Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIMEMd. Manowarul Islam0Habibur Rahman Rifat1Md. Shamim Bin Shahid2Arnisha Akhter3Md Ashraf Uddin4Khandaker Mohammad Mohi Uddin5Department of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Jagannath University Dhaka BangladeshDepartment of Computer Science and Engineering Southeast University Dhaka BangladeshABSTRACT Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset.https://doi.org/10.1002/eng2.13080diabetes predictiongridsearchCVmachine learningSHAPLIMEquantile transformer |
spellingShingle | Md. Manowarul Islam Habibur Rahman Rifat Md. Shamim Bin Shahid Arnisha Akhter Md Ashraf Uddin Khandaker Mohammad Mohi Uddin Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME Engineering Reports diabetes prediction gridsearchCV machine learning SHAP LIME quantile transformer |
title | Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME |
title_full | Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME |
title_fullStr | Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME |
title_full_unstemmed | Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME |
title_short | Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME |
title_sort | explainable machine learning for efficient diabetes prediction using hyperparameter tuning shap analysis partial dependency and lime |
topic | diabetes prediction gridsearchCV machine learning SHAP LIME quantile transformer |
url | https://doi.org/10.1002/eng2.13080 |
work_keys_str_mv | AT mdmanowarulislam explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT habiburrahmanrifat explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT mdshamimbinshahid explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT arnishaakhter explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT mdashrafuddin explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime AT khandakermohammadmohiuddin explainablemachinelearningforefficientdiabetespredictionusinghyperparametertuningshapanalysispartialdependencyandlime |