Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study

Abstract BackgroundPrediabetes is an intermediate stage between normal glucose metabolism and diabetes and is associated with increased risk of complications like cardiovascular disease and kidney failure. ObjectiveIt is crucial to recognize individuals with predia...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mahmoud B Almadhoun, MA Burhanuddin
Format:	Article
Language:	English
Published:	JMIR Publications 2025-07-01
Series:	JMIR Bioinformatics and Biotechnology
Online Access:	https://bioinform.jmir.org/2025/1/e70621
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849239116536872960
author	Mahmoud B Almadhoun MA Burhanuddin
author_facet	Mahmoud B Almadhoun MA Burhanuddin
author_sort	Mahmoud B Almadhoun
collection	DOAJ
description	Abstract BackgroundPrediabetes is an intermediate stage between normal glucose metabolism and diabetes and is associated with increased risk of complications like cardiovascular disease and kidney failure. ObjectiveIt is crucial to recognize individuals with prediabetes early in order to apply timely intervention strategies to decelerate or prohibit diabetes development. This study aims to compare the effectiveness of machine learning (ML) algorithms in predicting prediabetes and identifying its key clinical predictors. MethodsMultiple ML models are evaluated in this study, including random forest, extreme gradient boosting (XGBoost), support vector machine (SVM), and k. ResultsA cross-validated ROC-AUC (receiver operating characteristic area under the curve) score of 0.9117 highlighted the robustness of random forest in generalizing across datasets among the models tested. XGBoost followed closely, providing balanced accuracy in distinguishing between normal and prediabetic cases. While SVMs and KNNs performed adequately as baseline models, they exhibited limitations in sensitivity. The SHAP analysis indicated that BMI, age, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol emerged as the key predictors across models. The performance was significantly enhanced through hyperparameter tuning; for example, the ROC-AUC for SVM increased from 0.813 (default) to 0.863 (tuned). PCA kept 12 components while maintaining 95% of the variance in the dataset. ConclusionsIt is demonstrated in this research that optimized ML models, especially random forest and XGBoost, are effective tools for assessing early prediabetes risk. Combining SHAP analysis with LASSO and PCA enhances transparency, supporting their integration in real-time clinical decision support systems. Future directions include validating these models in diverse clinical settings and integrating additional biomarkers to improve prediction accuracy, offering a promising avenue for early intervention and personalized treatment strategies in preventive health care.
format	Article
id	doaj-art-d561e75d2a8f4d02bcd88fd26c9d1871
institution	Kabale University
issn	2563-3570
language	English
publishDate	2025-07-01
publisher	JMIR Publications
record_format	Article
series	JMIR Bioinformatics and Biotechnology
spelling	doaj-art-d561e75d2a8f4d02bcd88fd26c9d18712025-08-20T04:01:09ZengJMIR PublicationsJMIR Bioinformatics and Biotechnology2563-35702025-07-016e70621e7062110.2196/70621Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative StudyMahmoud B Almadhounhttp://orcid.org/0009-0001-3734-8735MA Burhanuddinhttp://orcid.org/0000-0001-8976-7416 Abstract BackgroundPrediabetes is an intermediate stage between normal glucose metabolism and diabetes and is associated with increased risk of complications like cardiovascular disease and kidney failure. ObjectiveIt is crucial to recognize individuals with prediabetes early in order to apply timely intervention strategies to decelerate or prohibit diabetes development. This study aims to compare the effectiveness of machine learning (ML) algorithms in predicting prediabetes and identifying its key clinical predictors. MethodsMultiple ML models are evaluated in this study, including random forest, extreme gradient boosting (XGBoost), support vector machine (SVM), and k. ResultsA cross-validated ROC-AUC (receiver operating characteristic area under the curve) score of 0.9117 highlighted the robustness of random forest in generalizing across datasets among the models tested. XGBoost followed closely, providing balanced accuracy in distinguishing between normal and prediabetic cases. While SVMs and KNNs performed adequately as baseline models, they exhibited limitations in sensitivity. The SHAP analysis indicated that BMI, age, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol emerged as the key predictors across models. The performance was significantly enhanced through hyperparameter tuning; for example, the ROC-AUC for SVM increased from 0.813 (default) to 0.863 (tuned). PCA kept 12 components while maintaining 95% of the variance in the dataset. ConclusionsIt is demonstrated in this research that optimized ML models, especially random forest and XGBoost, are effective tools for assessing early prediabetes risk. Combining SHAP analysis with LASSO and PCA enhances transparency, supporting their integration in real-time clinical decision support systems. Future directions include validating these models in diverse clinical settings and integrating additional biomarkers to improve prediction accuracy, offering a promising avenue for early intervention and personalized treatment strategies in preventive health care.https://bioinform.jmir.org/2025/1/e70621
spellingShingle	Mahmoud B Almadhoun MA Burhanuddin Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study JMIR Bioinformatics and Biotechnology
title	Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study
title_full	Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study
title_fullStr	Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study
title_full_unstemmed	Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study
title_short	Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study
title_sort	optimizing feature selection and machine learning algorithms for early detection of prediabetes risk comparative study
url	https://bioinform.jmir.org/2025/1/e70621
work_keys_str_mv	AT mahmoudbalmadhoun optimizingfeatureselectionandmachinelearningalgorithmsforearlydetectionofprediabetesriskcomparativestudy AT maburhanuddin optimizingfeatureselectionandmachinelearningalgorithmsforearlydetectionofprediabetesriskcomparativestudy

Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study

Similar Items