The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization

Abstract BackgroundElectronic health record (EHR) data are widely used in public health research, including in HIV-related studies, but are limited by potential bias due to incomplete and inaccurate information, lack of generalizability, and lack of representativeness....

Full description

Saved in:
Bibliographic Details
Main Authors: Arielle N'Diaye, Shan Qiao, Camryn Garrett, George Khushf, Jiajia Zhang, Xiaoming Li, Bankole Olatosi
Format: Article
Language:English
Published: JMIR Publications 2025-08-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2025/1/e71388
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849762091629543424
author Arielle N'Diaye
Shan Qiao
Camryn Garrett
George Khushf
Jiajia Zhang
Xiaoming Li
Bankole Olatosi
author_facet Arielle N'Diaye
Shan Qiao
Camryn Garrett
George Khushf
Jiajia Zhang
Xiaoming Li
Bankole Olatosi
author_sort Arielle N'Diaye
collection DOAJ
description Abstract BackgroundElectronic health record (EHR) data are widely used in public health research, including in HIV-related studies, but are limited by potential bias due to incomplete and inaccurate information, lack of generalizability, and lack of representativeness. ObjectiveThis study explores how workflow processes among HIV health care providers (HCPs), data scientists, and state health department professionals may potentially introduce or minimize bias within EHR data. MethodsOne focus group with 3 health department professionals working in HIV surveillance and 16 in-depth interviews (ie, 5 people with HIV, 5 HCPs, 5 data scientists, and 1 health department professional providing retention-in-care services) were conducted with participants purposively sampled in South Carolina from August 2023 to April 2024. All interviews were transcribed verbatim and analyzed using a constructivist grounded theory approach, where transcripts were first coded and then focused, axial, and theoretically coded. ResultsThe EHR data lifecycle originates with people with HIV and HCPs in the clinical setting. Data scientists then curate EHR data and health department professionals manage and use the data for surveillance and policy decision-making. Throughout this lifecycle, the three primary stakeholders (ie, HCPs, data scientists, and health department professionals) identified challenges with EHR processes and provided their recommendations and accommodations in addressing the related challenges. HCPs reported the influence of socio-structural biases on their inquiry, interpretation, and documentation of social determinants of health (SDOH) information of people living with HIV, the influence of which is proposed to be mitigated through people living with HIV access to their EHRs. Data scientists identified limited data availability and representativeness as biasing the data they manage. Health department professionals face challenges with delayed and incomplete data, which may be addressed statistically but require consideration of the data’s limitations. Overall, bias within the EHR data lifecycle persists because workflows are not intentionally structured to minimize bias and there is a diffusion of responsibility for data quality between the various stakeholders. ConclusionsFrom the perspective of various stakeholders, this study describes the EHR data lifecycle and its associated challenges as well as stakeholders’ accommodations and recommendations for mitigating and eliminating bias in EHR data. Based upon these findings, studies reliant on EHR data should adequately consider its challenges and limitations. Throughout the EHR data lifecycle, bias could be reduced through an inclusive, supportive health care environment, people living with HIV verification of SDOH information, the customization of data collection systems, and EHR data inspection for completeness, accuracy, and timeliness. Future research is needed to further identify instances where bias is introduced and how it can best be mitigated and eliminated across the EHR data lifecycle. Systematic changes are necessary to reduce instances of bias between data workflows and stakeholders.
format Article
id doaj-art-1e7ed3a4ea2e4e3ea1f73dbcdf2bf50b
institution DOAJ
issn 1438-8871
language English
publishDate 2025-08-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-1e7ed3a4ea2e4e3ea1f73dbcdf2bf50b2025-08-20T03:05:50ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-08-0127e71388e7138810.2196/71388The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for MinimizationArielle N'Diayehttp://orcid.org/0000-0001-5332-498XShan Qiaohttp://orcid.org/0000-0003-1834-1834Camryn Garretthttp://orcid.org/0000-0003-2569-0418George Khushfhttp://orcid.org/0000-0001-7488-7458Jiajia Zhanghttp://orcid.org/0000-0003-4566-0822Xiaoming Lihttp://orcid.org/0000-0002-5555-9034Bankole Olatosihttp://orcid.org/0000-0002-8295-8735 Abstract BackgroundElectronic health record (EHR) data are widely used in public health research, including in HIV-related studies, but are limited by potential bias due to incomplete and inaccurate information, lack of generalizability, and lack of representativeness. ObjectiveThis study explores how workflow processes among HIV health care providers (HCPs), data scientists, and state health department professionals may potentially introduce or minimize bias within EHR data. MethodsOne focus group with 3 health department professionals working in HIV surveillance and 16 in-depth interviews (ie, 5 people with HIV, 5 HCPs, 5 data scientists, and 1 health department professional providing retention-in-care services) were conducted with participants purposively sampled in South Carolina from August 2023 to April 2024. All interviews were transcribed verbatim and analyzed using a constructivist grounded theory approach, where transcripts were first coded and then focused, axial, and theoretically coded. ResultsThe EHR data lifecycle originates with people with HIV and HCPs in the clinical setting. Data scientists then curate EHR data and health department professionals manage and use the data for surveillance and policy decision-making. Throughout this lifecycle, the three primary stakeholders (ie, HCPs, data scientists, and health department professionals) identified challenges with EHR processes and provided their recommendations and accommodations in addressing the related challenges. HCPs reported the influence of socio-structural biases on their inquiry, interpretation, and documentation of social determinants of health (SDOH) information of people living with HIV, the influence of which is proposed to be mitigated through people living with HIV access to their EHRs. Data scientists identified limited data availability and representativeness as biasing the data they manage. Health department professionals face challenges with delayed and incomplete data, which may be addressed statistically but require consideration of the data’s limitations. Overall, bias within the EHR data lifecycle persists because workflows are not intentionally structured to minimize bias and there is a diffusion of responsibility for data quality between the various stakeholders. ConclusionsFrom the perspective of various stakeholders, this study describes the EHR data lifecycle and its associated challenges as well as stakeholders’ accommodations and recommendations for mitigating and eliminating bias in EHR data. Based upon these findings, studies reliant on EHR data should adequately consider its challenges and limitations. Throughout the EHR data lifecycle, bias could be reduced through an inclusive, supportive health care environment, people living with HIV verification of SDOH information, the customization of data collection systems, and EHR data inspection for completeness, accuracy, and timeliness. Future research is needed to further identify instances where bias is introduced and how it can best be mitigated and eliminated across the EHR data lifecycle. Systematic changes are necessary to reduce instances of bias between data workflows and stakeholders.https://www.jmir.org/2025/1/e71388
spellingShingle Arielle N'Diaye
Shan Qiao
Camryn Garrett
George Khushf
Jiajia Zhang
Xiaoming Li
Bankole Olatosi
The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
Journal of Medical Internet Research
title The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
title_full The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
title_fullStr The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
title_full_unstemmed The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
title_short The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
title_sort lifecycle of electronic health record data in hiv related big data studies qualitative study of bias instances and potential opportunities for minimization
url https://www.jmir.org/2025/1/e71388
work_keys_str_mv AT ariellendiaye thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT shanqiao thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT camryngarrett thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT georgekhushf thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT jiajiazhang thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT xiaomingli thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT bankoleolatosi thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT ariellendiaye lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT shanqiao lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT camryngarrett lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT georgekhushf lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT jiajiazhang lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT xiaomingli lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization
AT bankoleolatosi lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization