The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization
Abstract BackgroundElectronic health record (EHR) data are widely used in public health research, including in HIV-related studies, but are limited by potential bias due to incomplete and inaccurate information, lack of generalizability, and lack of representativeness....
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-08-01
|
| Series: | Journal of Medical Internet Research |
| Online Access: | https://www.jmir.org/2025/1/e71388 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849762091629543424 |
|---|---|
| author | Arielle N'Diaye Shan Qiao Camryn Garrett George Khushf Jiajia Zhang Xiaoming Li Bankole Olatosi |
| author_facet | Arielle N'Diaye Shan Qiao Camryn Garrett George Khushf Jiajia Zhang Xiaoming Li Bankole Olatosi |
| author_sort | Arielle N'Diaye |
| collection | DOAJ |
| description |
Abstract
BackgroundElectronic health record (EHR) data are widely used in public health research, including in HIV-related studies, but are limited by potential bias due to incomplete and inaccurate information, lack of generalizability, and lack of representativeness.
ObjectiveThis study explores how workflow processes among HIV health care providers (HCPs), data scientists, and state health department professionals may potentially introduce or minimize bias within EHR data.
MethodsOne focus group with 3 health department professionals working in HIV surveillance and 16 in-depth interviews (ie, 5 people with HIV, 5 HCPs, 5 data scientists, and 1 health department professional providing retention-in-care services) were conducted with participants purposively sampled in South Carolina from August 2023 to April 2024. All interviews were transcribed verbatim and analyzed using a constructivist grounded theory approach, where transcripts were first coded and then focused, axial, and theoretically coded.
ResultsThe EHR data lifecycle originates with people with HIV and HCPs in the clinical setting. Data scientists then curate EHR data and health department professionals manage and use the data for surveillance and policy decision-making. Throughout this lifecycle, the three primary stakeholders (ie, HCPs, data scientists, and health department professionals) identified challenges with EHR processes and provided their recommendations and accommodations in addressing the related challenges. HCPs reported the influence of socio-structural biases on their inquiry, interpretation, and documentation of social determinants of health (SDOH) information of people living with HIV, the influence of which is proposed to be mitigated through people living with HIV access to their EHRs. Data scientists identified limited data availability and representativeness as biasing the data they manage. Health department professionals face challenges with delayed and incomplete data, which may be addressed statistically but require consideration of the data’s limitations. Overall, bias within the EHR data lifecycle persists because workflows are not intentionally structured to minimize bias and there is a diffusion of responsibility for data quality between the various stakeholders.
ConclusionsFrom the perspective of various stakeholders, this study describes the EHR data lifecycle and its associated challenges as well as stakeholders’ accommodations and recommendations for mitigating and eliminating bias in EHR data. Based upon these findings, studies reliant on EHR data should adequately consider its challenges and limitations. Throughout the EHR data lifecycle, bias could be reduced through an inclusive, supportive health care environment, people living with HIV verification of SDOH information, the customization of data collection systems, and EHR data inspection for completeness, accuracy, and timeliness. Future research is needed to further identify instances where bias is introduced and how it can best be mitigated and eliminated across the EHR data lifecycle. Systematic changes are necessary to reduce instances of bias between data workflows and stakeholders. |
| format | Article |
| id | doaj-art-1e7ed3a4ea2e4e3ea1f73dbcdf2bf50b |
| institution | DOAJ |
| issn | 1438-8871 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | Journal of Medical Internet Research |
| spelling | doaj-art-1e7ed3a4ea2e4e3ea1f73dbcdf2bf50b2025-08-20T03:05:50ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-08-0127e71388e7138810.2196/71388The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for MinimizationArielle N'Diayehttp://orcid.org/0000-0001-5332-498XShan Qiaohttp://orcid.org/0000-0003-1834-1834Camryn Garretthttp://orcid.org/0000-0003-2569-0418George Khushfhttp://orcid.org/0000-0001-7488-7458Jiajia Zhanghttp://orcid.org/0000-0003-4566-0822Xiaoming Lihttp://orcid.org/0000-0002-5555-9034Bankole Olatosihttp://orcid.org/0000-0002-8295-8735 Abstract BackgroundElectronic health record (EHR) data are widely used in public health research, including in HIV-related studies, but are limited by potential bias due to incomplete and inaccurate information, lack of generalizability, and lack of representativeness. ObjectiveThis study explores how workflow processes among HIV health care providers (HCPs), data scientists, and state health department professionals may potentially introduce or minimize bias within EHR data. MethodsOne focus group with 3 health department professionals working in HIV surveillance and 16 in-depth interviews (ie, 5 people with HIV, 5 HCPs, 5 data scientists, and 1 health department professional providing retention-in-care services) were conducted with participants purposively sampled in South Carolina from August 2023 to April 2024. All interviews were transcribed verbatim and analyzed using a constructivist grounded theory approach, where transcripts were first coded and then focused, axial, and theoretically coded. ResultsThe EHR data lifecycle originates with people with HIV and HCPs in the clinical setting. Data scientists then curate EHR data and health department professionals manage and use the data for surveillance and policy decision-making. Throughout this lifecycle, the three primary stakeholders (ie, HCPs, data scientists, and health department professionals) identified challenges with EHR processes and provided their recommendations and accommodations in addressing the related challenges. HCPs reported the influence of socio-structural biases on their inquiry, interpretation, and documentation of social determinants of health (SDOH) information of people living with HIV, the influence of which is proposed to be mitigated through people living with HIV access to their EHRs. Data scientists identified limited data availability and representativeness as biasing the data they manage. Health department professionals face challenges with delayed and incomplete data, which may be addressed statistically but require consideration of the data’s limitations. Overall, bias within the EHR data lifecycle persists because workflows are not intentionally structured to minimize bias and there is a diffusion of responsibility for data quality between the various stakeholders. ConclusionsFrom the perspective of various stakeholders, this study describes the EHR data lifecycle and its associated challenges as well as stakeholders’ accommodations and recommendations for mitigating and eliminating bias in EHR data. Based upon these findings, studies reliant on EHR data should adequately consider its challenges and limitations. Throughout the EHR data lifecycle, bias could be reduced through an inclusive, supportive health care environment, people living with HIV verification of SDOH information, the customization of data collection systems, and EHR data inspection for completeness, accuracy, and timeliness. Future research is needed to further identify instances where bias is introduced and how it can best be mitigated and eliminated across the EHR data lifecycle. Systematic changes are necessary to reduce instances of bias between data workflows and stakeholders.https://www.jmir.org/2025/1/e71388 |
| spellingShingle | Arielle N'Diaye Shan Qiao Camryn Garrett George Khushf Jiajia Zhang Xiaoming Li Bankole Olatosi The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization Journal of Medical Internet Research |
| title | The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization |
| title_full | The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization |
| title_fullStr | The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization |
| title_full_unstemmed | The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization |
| title_short | The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization |
| title_sort | lifecycle of electronic health record data in hiv related big data studies qualitative study of bias instances and potential opportunities for minimization |
| url | https://www.jmir.org/2025/1/e71388 |
| work_keys_str_mv | AT ariellendiaye thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT shanqiao thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT camryngarrett thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT georgekhushf thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT jiajiazhang thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT xiaomingli thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT bankoleolatosi thelifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT ariellendiaye lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT shanqiao lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT camryngarrett lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT georgekhushf lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT jiajiazhang lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT xiaomingli lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization AT bankoleolatosi lifecycleofelectronichealthrecorddatainhivrelatedbigdatastudiesqualitativestudyofbiasinstancesandpotentialopportunitiesforminimization |