Evaluating predictive performance, validity, and applicability of machine learning models for predicting HIV treatment interruption: a systematic review

Abstract Background HIV treatment interruption remains a significant barrier to achieving global HIV/AIDS control goals. Machine learning (ML) models offer potential for predicting treatment interruption by leveraging large clinical data. Understanding how these models were developed, validated, and...

Full description

Saved in:
Bibliographic Details
Main Authors: Williams Kwarah, Frances Baaba da-Costa Vroom, Duah Dwomoh, Samuel Bosomprah
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Global and Public Health
Subjects:
Online Access:https://doi.org/10.1186/s44263-025-00184-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background HIV treatment interruption remains a significant barrier to achieving global HIV/AIDS control goals. Machine learning (ML) models offer potential for predicting treatment interruption by leveraging large clinical data. Understanding how these models were developed, validated, and applied remains essential for advancing research. Methods We searched databases including the PubMed, BMC, Cochrane Library, Scopus, ScienceDirect, Lancet, and Google Scholar, for studies published in English from 1990 to September 2024. Search terms covered HIV, machine learning, treatment interruption, and loss to follow-up. Articles were screened and reviewed independently, and data were extracted using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) tool. Risk of bias was assessed with Prediction model Risk Of Bias Assessment Tool (PROBAST). The Preferred Reporting Items for Systematic reviews and Meta-analysis (PRISMA) guidelines were followed throughout. Results Out of 116,672 records, 9 studies met the inclusion criteria and reported 12 ML models. Random Forest, XGBoost, and AdaBoost were predominant models (91.7%). Internal validation was performed in all models, but only two models included external validation. Performance varied, with a mean area under the receiver operating characteristic curve (AUC-ROC) of 0.668 (standard deviation (SD) = 0.066), indicating moderate discrimination. About 75% of models showed a high risk of bias due to inadequate handling of missing data, lack of calibration, and the absence of decision curve analysis (DCA). Conclusions ML models show promise for predicting HIV treatment interruption, particularly in resource-limited settings. Future research should prioritize external validation, robust missing data handling, and decision curve analysis and include sociocultural predictors to improve model robustness. Systematic review registration PROSPERO CRD42024578109.
ISSN:2731-913X