Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda

Standard statistical analyses often exclude incomplete observations, which can be particularly problematic when predicting rare outcomes, such as HIV positivity. In the linkage to the HIV care dataset, there were initially 553 complete HIV positive cases, with an additional 554 cases added through i...

Full description

Saved in:
Bibliographic Details
Main Authors: Nadia B. Mendoza, Chii-Dean Lin, Susan M. Kiene, Nicolas A. Menzies, Rhoda K. Wanyenze, Katherine A. Schmarje, Rose Naigino, Michael Ediau, Seth C. Kalichman, Barbara A. Bailey
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Stats
Subjects:
Online Access:https://www.mdpi.com/2571-905X/7/4/82
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Standard statistical analyses often exclude incomplete observations, which can be particularly problematic when predicting rare outcomes, such as HIV positivity. In the linkage to the HIV care dataset, there were initially 553 complete HIV positive cases, with an additional 554 cases added through imputation. Imputation methods <i>amelia</i>, <i>hmisc</i>, <i>mice</i> and <i>missForest</i> were evaluated. Simulations were conducted across various scenarios using the complete data to guide imputation for the full dataset. A random forest model was used to predict HIV status, assessing imputation precision, overall prediction accuracy, and sensitivity. While <i>missForest</i> produced imputed values closer to the observed ones, this did not translate into better predictive models. <i>Hmisc</i> and <i>mice</i> imputations led to higher prediction accuracy and sensitivity, with median accuracy increasing from 64% to 76% and median sensitivity rising from 0.4 to 0.75. <i>Hmisc</i> and <i>amelia</i> were the fastest imputation methods. Additionally, oversampling the minority class combined with undersampling the majority class did not improve predictions of new HIV positive cases using only the complete observations. However, increasing the minority class information through imputation enhanced sensitivity for predicting cases in this class.
ISSN:2571-905X