Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
Introduction There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevan...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMJ Publishing Group
2024-10-01
|
| Series: | BMJ Public Health |
| Online Access: | https://bmjpublichealth.bmj.com/content/2/2/e001666.full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850204237614546944 |
|---|---|
| author | Rebecca Anthopolos Shannon M Farley David C Lee Lorna E Thorpe Jasmin Divers Sandra S Albrecht Sarah Conderino |
| author_facet | Rebecca Anthopolos Shannon M Farley David C Lee Lorna E Thorpe Jasmin Divers Sandra S Albrecht Sarah Conderino |
| author_sort | Rebecca Anthopolos |
| collection | DOAJ |
| description | Introduction There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state or other jurisdictions. We focus on using EHR data for the estimation of diabetes prevalence among young adults in New York City, as the rising diabetes burden in younger ages calls for better surveillance capacity.Methods This article applies common non-probability sampling methods, including raking, post-stratification and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18–44 years. Within real data analyses, we externally validate city-level and neighbourhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.Results Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared with the gold standard. Residual biases remained at the neighbourhood-level, where prevalence tended to be overestimated, especially in neighbourhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.Conclusions While EHRs offer the potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable. |
| format | Article |
| id | doaj-art-4d73b60a3daa44ecab6d0e14a01e996d |
| institution | OA Journals |
| issn | 2753-4294 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | BMJ Publishing Group |
| record_format | Article |
| series | BMJ Public Health |
| spelling | doaj-art-4d73b60a3daa44ecab6d0e14a01e996d2025-08-20T02:11:19ZengBMJ Publishing GroupBMJ Public Health2753-42942024-10-012210.1136/bmjph-2024-001666Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional studyRebecca Anthopolos0Shannon M Farley1David C Lee2Lorna E Thorpe3Jasmin Divers4Sandra S Albrecht5Sarah Conderino6Department of Population Health, New York University Grossman School of Medicine, New York, New York, USA1 Bureau of Chronic Disease Prevention and Tobacco Control, New York City Department of Health and Mental Hygiene, Queens, New York, USADepartment of Population Health, NYU Grossman School of Medicine, New York, New York, USADepartment of Population Health, New York University Grossman School of Medicine, New York, NY, USADepartment of Foundations of Medicine, New York University Long Island School of Medicine, Mineola, New York, USA1Carolina Population Center, University of North Carolina, Chapel Hill, North Carolina, USADepartment of Population Health, New York University Grossman School of Medicine, New York, New York, USAIntroduction There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state or other jurisdictions. We focus on using EHR data for the estimation of diabetes prevalence among young adults in New York City, as the rising diabetes burden in younger ages calls for better surveillance capacity.Methods This article applies common non-probability sampling methods, including raking, post-stratification and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18–44 years. Within real data analyses, we externally validate city-level and neighbourhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.Results Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared with the gold standard. Residual biases remained at the neighbourhood-level, where prevalence tended to be overestimated, especially in neighbourhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.Conclusions While EHRs offer the potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.https://bmjpublichealth.bmj.com/content/2/2/e001666.full |
| spellingShingle | Rebecca Anthopolos Shannon M Farley David C Lee Lorna E Thorpe Jasmin Divers Sandra S Albrecht Sarah Conderino Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study BMJ Public Health |
| title | Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study |
| title_full | Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study |
| title_fullStr | Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study |
| title_full_unstemmed | Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study |
| title_short | Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study |
| title_sort | addressing selection biases within electronic health record data for estimation of diabetes prevalence among new york city young adults a cross sectional study |
| url | https://bmjpublichealth.bmj.com/content/2/2/e001666.full |
| work_keys_str_mv | AT rebeccaanthopolos addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy AT shannonmfarley addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy AT davidclee addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy AT lornaethorpe addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy AT jasmindivers addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy AT sandrasalbrecht addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy AT sarahconderino addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy |