Enhancing diabetes risk prediction through focal active learning and machine learning models.

To improve the effectiveness of diabetes risk prediction, this study proposes a novel method based on focal active learning strategies combined with machine learning models. Existing machine learning models often suffer from poor performance on imbalanced medical datasets, where minority class insta...

Full description

Saved in:
Bibliographic Details
Main Authors: Wangyouchen Zhang, Zhenhua Xia, Guoqing Cai, Junhao Wang, Xutao Dong
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0327120
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849429214370988032
author Wangyouchen Zhang
Zhenhua Xia
Guoqing Cai
Junhao Wang
Xutao Dong
author_facet Wangyouchen Zhang
Zhenhua Xia
Guoqing Cai
Junhao Wang
Xutao Dong
author_sort Wangyouchen Zhang
collection DOAJ
description To improve the effectiveness of diabetes risk prediction, this study proposes a novel method based on focal active learning strategies combined with machine learning models. Existing machine learning models often suffer from poor performance on imbalanced medical datasets, where minority class instances such as diabetic cases are underrepresented. Our proposed Focal Active Learning method selectively samples informative instances to mitigate this imbalance, leading to better prediction outcomes with fewer labeled samples. The method integrates SHAP (SHapley Additive Explanations) to quantify feature importance and applies attention mechanisms to dynamically adjust feature weights, enhancing model interpretability and performance in predicting diabetes risk. To address the issue of imbalanced classification in diabetes datasets, we employed a clustering-based method to identify representative data points (called foci), and iteratively constructed a smaller labeled dataset (sub-pool) around them using similarity-based sampling. This method aims to overcome common challenges, such as poor performance on minority classes and limited generalization, by enabling more efficient data utilization and reducing labeling costs. The experimental results demonstrated that our approach significantly improved the evaluation metrics for diabetes risk prediction, achieving an accuracy of 97.41% and a recall rate of 94.70%, clearly outperforming traditional models that typically achieve 95% accuracy and 92% recall. Additionally, the model's generalization ability was further validated on the public PIMA Indians Diabetes DataBase, outperforming traditional models in both accuracy and recall. This approach can enhance early diabetes screening in clinical settings, helping healthcare professionals reduce diagnostic errors and optimize resource allocation.
format Article
id doaj-art-55ace4852ef448889e4f5bc12d53ea7d
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-55ace4852ef448889e4f5bc12d53ea7d2025-08-20T03:28:26ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01207e032712010.1371/journal.pone.0327120Enhancing diabetes risk prediction through focal active learning and machine learning models.Wangyouchen ZhangZhenhua XiaGuoqing CaiJunhao WangXutao DongTo improve the effectiveness of diabetes risk prediction, this study proposes a novel method based on focal active learning strategies combined with machine learning models. Existing machine learning models often suffer from poor performance on imbalanced medical datasets, where minority class instances such as diabetic cases are underrepresented. Our proposed Focal Active Learning method selectively samples informative instances to mitigate this imbalance, leading to better prediction outcomes with fewer labeled samples. The method integrates SHAP (SHapley Additive Explanations) to quantify feature importance and applies attention mechanisms to dynamically adjust feature weights, enhancing model interpretability and performance in predicting diabetes risk. To address the issue of imbalanced classification in diabetes datasets, we employed a clustering-based method to identify representative data points (called foci), and iteratively constructed a smaller labeled dataset (sub-pool) around them using similarity-based sampling. This method aims to overcome common challenges, such as poor performance on minority classes and limited generalization, by enabling more efficient data utilization and reducing labeling costs. The experimental results demonstrated that our approach significantly improved the evaluation metrics for diabetes risk prediction, achieving an accuracy of 97.41% and a recall rate of 94.70%, clearly outperforming traditional models that typically achieve 95% accuracy and 92% recall. Additionally, the model's generalization ability was further validated on the public PIMA Indians Diabetes DataBase, outperforming traditional models in both accuracy and recall. This approach can enhance early diabetes screening in clinical settings, helping healthcare professionals reduce diagnostic errors and optimize resource allocation.https://doi.org/10.1371/journal.pone.0327120
spellingShingle Wangyouchen Zhang
Zhenhua Xia
Guoqing Cai
Junhao Wang
Xutao Dong
Enhancing diabetes risk prediction through focal active learning and machine learning models.
PLoS ONE
title Enhancing diabetes risk prediction through focal active learning and machine learning models.
title_full Enhancing diabetes risk prediction through focal active learning and machine learning models.
title_fullStr Enhancing diabetes risk prediction through focal active learning and machine learning models.
title_full_unstemmed Enhancing diabetes risk prediction through focal active learning and machine learning models.
title_short Enhancing diabetes risk prediction through focal active learning and machine learning models.
title_sort enhancing diabetes risk prediction through focal active learning and machine learning models
url https://doi.org/10.1371/journal.pone.0327120
work_keys_str_mv AT wangyouchenzhang enhancingdiabetesriskpredictionthroughfocalactivelearningandmachinelearningmodels
AT zhenhuaxia enhancingdiabetesriskpredictionthroughfocalactivelearningandmachinelearningmodels
AT guoqingcai enhancingdiabetesriskpredictionthroughfocalactivelearningandmachinelearningmodels
AT junhaowang enhancingdiabetesriskpredictionthroughfocalactivelearningandmachinelearningmodels
AT xutaodong enhancingdiabetesriskpredictionthroughfocalactivelearningandmachinelearningmodels