A machine learning PROGRAM to identify COVID-19 and other diseases from hematology data

Aim: We propose a method for screening full blood count metadata for evidence of communicable and noncommunicable diseases using machine learning (ML). Materials & methods: High dimensional hematology metadata was extracted over an 11-month period from Sysmex hematology analyzers from 43,761 pat...

Full description

Saved in:
Bibliographic Details
Main Authors: Patrick A Gladding, Zina Ayar, Kevin Smith, Prashant Patel, Julia Pearce, Shalini Puwakdandawa, Dianne Tarrant, Jon Atkinson, Elizabeth McChlery, Merit Hanna, Nick Gow, Hasan Bhally, Kerry Read, Prageeth Jayathissa, Jonathan Wallace, Sam Norton, Nick Kasabov, Cristian S Calude, Deborah Steel, Colin Mckenzie
Format: Article
Language:English
Published: Taylor & Francis Group 2021-08-01
Series:Future Science OA
Subjects:
Online Access:https://www.future-science.com/doi/10.2144/fsoa-2020-0207
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Aim: We propose a method for screening full blood count metadata for evidence of communicable and noncommunicable diseases using machine learning (ML). Materials & methods: High dimensional hematology metadata was extracted over an 11-month period from Sysmex hematology analyzers from 43,761 patients. Predictive models for age, sex and individuality were developed to demonstrate the personalized nature of hematology data. Both numeric and raw flow cytometry data were used for both supervised and unsupervised ML to predict the presence of pneumonia, urinary tract infection and COVID-19. Heart failure was used as an objective to prove method generalizability. Results: Chronological age was predicted by a deep neural network with R2: 0.59; mean absolute error: 12; sex with AUROC: 0.83, phi: 0.47; individuality with 99.7% accuracy, phi: 0.97; pneumonia with AUROC: 0.74, sensitivity 58%, specificity 79%, 95% CI: 0.73–0.75, p < 0.0001; urinary tract infection AUROC: 0.68, sensitivity 52%, specificity 79%, 95% CI: 0.67–0.68, p < 0.0001; COVID-19 AUROC: 0.8, sensitivity 82%, specificity 75%, 95% CI: 0.79–0.8, p = 0.0006; and heart failure area under the receiver operator curve (AUROC): 0.78, sensitivity 72%, specificity 72%, 95% CI: 0.77–0.78; p < 0.0001. Conclusion: ML applied to hematology data could predict communicable and noncommunicable diseases, both at local and global levels.
ISSN:2056-5623