Development of an electronic health record-based Human Immunodeficiency Virus (HIV) risk prediction model for women, incorporating social determinants of health

Abstract Background Human Immunodeficiency Virus (HIV) pre-exposure prophylaxis (PrEP) prevents HIV transmission but has low uptake among women. Identifying women who could benefit from PrEP remains a challenge. This study developed a women-specific model to predict HIV risk within a year using elec...

Full description

Saved in:
Bibliographic Details
Main Authors: Yiyang Liu, Aokun Chen, Hwayoung Cho, Khairul A. Siddiqi, Robert L. Cook, Mattia Prosperi
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Public Health
Subjects:
Online Access:https://doi.org/10.1186/s12889-025-23460-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Human Immunodeficiency Virus (HIV) pre-exposure prophylaxis (PrEP) prevents HIV transmission but has low uptake among women. Identifying women who could benefit from PrEP remains a challenge. This study developed a women-specific model to predict HIV risk within a year using electronic health record (EHR) data and social determinants of health (SDoH). Methods We conducted a case-control study using EHR and claims data from a centralized patient repository in the Southeastern United States (OneFlorida+). The dataset was split into 60% training, 30% testing, and 10% calibration. Five-fold cross-validation was applied for hyperparameter tuning. Contextual-level SDoH were linked to EHR/claim data. Various machine learning (ML) methods were tested, and Shapley Additive Explanations (SHAP) values were used to interpret the model. Results Our sample included 1,458 women newly diagnosed with HIV and 33,155 controls who had never been diagnosed. The XGBoost model outperformed other ML methods, achieving an area under the curve (AUC) of 89.3%. Sensitivity and specificity ranged from 83% to 82% at the optimal Youden’s index cutoff, identifying 20% as high risk, to 42% and 97% at the optimal F1 score cutoff, identifying 5% as high risk. Of the 20 features with the highest SHAP values, 11 were related to SDoH. Conclusion The final model, incorporating demographics, clinical features, and SDoH, can predict HIV risk in the next year for women. Several SDoH factors were found to be important predictors. Future work could involve stakeholders in implementing the model into HIV PrEP decision support and exploring causal pathways to guide risk-reduction interventions.
ISSN:1471-2458