Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study

Introduction There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevan...

Full description

Saved in:
Bibliographic Details
Main Authors: Rebecca Anthopolos, Shannon M Farley, David C Lee, Lorna E Thorpe, Jasmin Divers, Sandra S Albrecht, Sarah Conderino
Format: Article
Language:English
Published: BMJ Publishing Group 2024-10-01
Series:BMJ Public Health
Online Access:https://bmjpublichealth.bmj.com/content/2/2/e001666.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850204237614546944
author Rebecca Anthopolos
Shannon M Farley
David C Lee
Lorna E Thorpe
Jasmin Divers
Sandra S Albrecht
Sarah Conderino
author_facet Rebecca Anthopolos
Shannon M Farley
David C Lee
Lorna E Thorpe
Jasmin Divers
Sandra S Albrecht
Sarah Conderino
author_sort Rebecca Anthopolos
collection DOAJ
description Introduction There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state or other jurisdictions. We focus on using EHR data for the estimation of diabetes prevalence among young adults in New York City, as the rising diabetes burden in younger ages calls for better surveillance capacity.Methods This article applies common non-probability sampling methods, including raking, post-stratification and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18–44 years. Within real data analyses, we externally validate city-level and neighbourhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.Results Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared with the gold standard. Residual biases remained at the neighbourhood-level, where prevalence tended to be overestimated, especially in neighbourhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.Conclusions While EHRs offer the potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.
format Article
id doaj-art-4d73b60a3daa44ecab6d0e14a01e996d
institution OA Journals
issn 2753-4294
language English
publishDate 2024-10-01
publisher BMJ Publishing Group
record_format Article
series BMJ Public Health
spelling doaj-art-4d73b60a3daa44ecab6d0e14a01e996d2025-08-20T02:11:19ZengBMJ Publishing GroupBMJ Public Health2753-42942024-10-012210.1136/bmjph-2024-001666Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional studyRebecca Anthopolos0Shannon M Farley1David C Lee2Lorna E Thorpe3Jasmin Divers4Sandra S Albrecht5Sarah Conderino6Department of Population Health, New York University Grossman School of Medicine, New York, New York, USA1 Bureau of Chronic Disease Prevention and Tobacco Control, New York City Department of Health and Mental Hygiene, Queens, New York, USADepartment of Population Health, NYU Grossman School of Medicine, New York, New York, USADepartment of Population Health, New York University Grossman School of Medicine, New York, NY, USADepartment of Foundations of Medicine, New York University Long Island School of Medicine, Mineola, New York, USA1Carolina Population Center, University of North Carolina, Chapel Hill, North Carolina, USADepartment of Population Health, New York University Grossman School of Medicine, New York, New York, USAIntroduction There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state or other jurisdictions. We focus on using EHR data for the estimation of diabetes prevalence among young adults in New York City, as the rising diabetes burden in younger ages calls for better surveillance capacity.Methods This article applies common non-probability sampling methods, including raking, post-stratification and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18–44 years. Within real data analyses, we externally validate city-level and neighbourhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.Results Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared with the gold standard. Residual biases remained at the neighbourhood-level, where prevalence tended to be overestimated, especially in neighbourhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.Conclusions While EHRs offer the potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.https://bmjpublichealth.bmj.com/content/2/2/e001666.full
spellingShingle Rebecca Anthopolos
Shannon M Farley
David C Lee
Lorna E Thorpe
Jasmin Divers
Sandra S Albrecht
Sarah Conderino
Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
BMJ Public Health
title Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
title_full Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
title_fullStr Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
title_full_unstemmed Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
title_short Addressing selection biases within electronic health record data for estimation of diabetes prevalence among New York City young adults: a cross-sectional study
title_sort addressing selection biases within electronic health record data for estimation of diabetes prevalence among new york city young adults a cross sectional study
url https://bmjpublichealth.bmj.com/content/2/2/e001666.full
work_keys_str_mv AT rebeccaanthopolos addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy
AT shannonmfarley addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy
AT davidclee addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy
AT lornaethorpe addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy
AT jasmindivers addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy
AT sandrasalbrecht addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy
AT sarahconderino addressingselectionbiaseswithinelectronichealthrecorddataforestimationofdiabetesprevalenceamongnewyorkcityyoungadultsacrosssectionalstudy