Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study
Objectives This study aimed to develop and validate a machine-learning (ML) model to predict iron deficiency without anaemia (IDWA) using routinely collected electronic health record (EHR) data. The primary hypothesis was that an ML model could achieve better accuracy in identifying low ferritin lev...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMJ Publishing Group
2025-08-01
|
| Series: | BMJ Open |
| Online Access: | https://bmjopen.bmj.com/content/15/8/e097016.full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849405438179672064 |
|---|---|
| author | Girish N Nadkarni Orly Efros Eyal Klang Shelly Soffer Gili Kenet Aya Mudrik Renana Robinson |
| author_facet | Girish N Nadkarni Orly Efros Eyal Klang Shelly Soffer Gili Kenet Aya Mudrik Renana Robinson |
| author_sort | Girish N Nadkarni |
| collection | DOAJ |
| description | Objectives This study aimed to develop and validate a machine-learning (ML) model to predict iron deficiency without anaemia (IDWA) using routinely collected electronic health record (EHR) data. The primary hypothesis was that an ML model could achieve better accuracy in identifying low ferritin levels (<30 ng/mL) in non-anaemic patients compared with traditional methods.Design A retrospective cohort study.Setting Data were derived from secondary and tertiary care facilities within the eight-hospital Mount Sinai Health System, an urban academic health system.Participants The study included 211 486 adult patients (aged ≥18 years) with normal haemoglobin levels (≥130 g/L for men and ≥120 g/L for women) and recorded ferritin measurements.Primary and secondary outcome measures The primary outcome was the prediction of low ferritin levels (<30 ng/mL) using extreme gradient-boosted decision trees, an ML algorithm suited for structured clinical data. Secondary outcomes included subgroup analyses stratified by sex and age to evaluate model performance in different populations.Data from 211 486 Mount Sinai Health System patients with normal haemoglobin levels and ferritin testing were analysed. The model used demographic data, blood count indices and chemistry results to identify low ferritin levels (<30 ng/mL).Results Of the 211 486 patients analysed, 19.56% (n=41 368) of the patients had low ferritin levels. In the low ferritin group, the mean age was 41.28 years with 89.64% females. In contrast, the normal ferritin group had a mean age of 50.14 years with 62.02% females. The model achieved an area under the curve (AUC) of 0.814. At a sensitivity threshold of 70%, the model had a specificity of 75.85%, with a positive predictive value of 37.6% and a negative predictive value of 92.41%. The model outperformed an alternative model based only on complete blood count indices (AUC 0.814 vs 0.741). Subgroup analysis showed that model accuracy varied by sex and age, with lower performance in premenopausal women (AUC 0.736) compared with postmenopausal women (AUC 0.793) and men (AUC of 0.832 in those under 60 years and 0.806 in those aged 60 and above).Conclusions The ML model provides an effective approach to screening for IDWA using readily available EHR data. Implementing this tool in clinical settings may facilitate early diagnosis of IDWA. |
| format | Article |
| id | doaj-art-0e94d88ffb6b4bacaf8db427d8b68d34 |
| institution | Kabale University |
| issn | 2044-6055 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | BMJ Publishing Group |
| record_format | Article |
| series | BMJ Open |
| spelling | doaj-art-0e94d88ffb6b4bacaf8db427d8b68d342025-08-20T03:36:41ZengBMJ Publishing GroupBMJ Open2044-60552025-08-0115810.1136/bmjopen-2024-097016Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort studyGirish N Nadkarni0Orly Efros1Eyal Klang2Shelly Soffer3Gili Kenet4Aya Mudrik5Renana Robinson6The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, New York, USANational Hemophilia Center and Institute of Thrombosis & Hemostasis, Chaim Sheba Medical Center, Tel Hashomer, IsraelThe Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, New York, USAGray Faculty of Medical and Health Sciences, Tel Aviv-Yafo, IsraelNational Hemophilia Center and Institute of Thrombosis & Hemostasis, Chaim Sheba Medical Center, Tel Hashomer, IsraelBen-Gurion University of the Negev, Beer-Sheva, IsraelGray Faculty of Medical and Health Sciences, Tel Aviv-Yafo, IsraelObjectives This study aimed to develop and validate a machine-learning (ML) model to predict iron deficiency without anaemia (IDWA) using routinely collected electronic health record (EHR) data. The primary hypothesis was that an ML model could achieve better accuracy in identifying low ferritin levels (<30 ng/mL) in non-anaemic patients compared with traditional methods.Design A retrospective cohort study.Setting Data were derived from secondary and tertiary care facilities within the eight-hospital Mount Sinai Health System, an urban academic health system.Participants The study included 211 486 adult patients (aged ≥18 years) with normal haemoglobin levels (≥130 g/L for men and ≥120 g/L for women) and recorded ferritin measurements.Primary and secondary outcome measures The primary outcome was the prediction of low ferritin levels (<30 ng/mL) using extreme gradient-boosted decision trees, an ML algorithm suited for structured clinical data. Secondary outcomes included subgroup analyses stratified by sex and age to evaluate model performance in different populations.Data from 211 486 Mount Sinai Health System patients with normal haemoglobin levels and ferritin testing were analysed. The model used demographic data, blood count indices and chemistry results to identify low ferritin levels (<30 ng/mL).Results Of the 211 486 patients analysed, 19.56% (n=41 368) of the patients had low ferritin levels. In the low ferritin group, the mean age was 41.28 years with 89.64% females. In contrast, the normal ferritin group had a mean age of 50.14 years with 62.02% females. The model achieved an area under the curve (AUC) of 0.814. At a sensitivity threshold of 70%, the model had a specificity of 75.85%, with a positive predictive value of 37.6% and a negative predictive value of 92.41%. The model outperformed an alternative model based only on complete blood count indices (AUC 0.814 vs 0.741). Subgroup analysis showed that model accuracy varied by sex and age, with lower performance in premenopausal women (AUC 0.736) compared with postmenopausal women (AUC 0.793) and men (AUC of 0.832 in those under 60 years and 0.806 in those aged 60 and above).Conclusions The ML model provides an effective approach to screening for IDWA using readily available EHR data. Implementing this tool in clinical settings may facilitate early diagnosis of IDWA.https://bmjopen.bmj.com/content/15/8/e097016.full |
| spellingShingle | Girish N Nadkarni Orly Efros Eyal Klang Shelly Soffer Gili Kenet Aya Mudrik Renana Robinson Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study BMJ Open |
| title | Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study |
| title_full | Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study |
| title_fullStr | Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study |
| title_full_unstemmed | Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study |
| title_short | Predictive machine-learning model for screening iron deficiency without anaemia: a retrospective cohort study |
| title_sort | predictive machine learning model for screening iron deficiency without anaemia a retrospective cohort study |
| url | https://bmjopen.bmj.com/content/15/8/e097016.full |
| work_keys_str_mv | AT girishnnadkarni predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy AT orlyefros predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy AT eyalklang predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy AT shellysoffer predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy AT gilikenet predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy AT ayamudrik predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy AT renanarobinson predictivemachinelearningmodelforscreeningirondeficiencywithoutanaemiaaretrospectivecohortstudy |