The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.

The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Re...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali Safdari, Chanda Sai Keshav, Deepanshu Mody, Kshitij Verma, Utsav Kaushal, Vaadeendra Kumar Burra, Sibnath Ray, Debashree Bandyopadhyay
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0316467
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823864069979897856
author Ali Safdari
Chanda Sai Keshav
Deepanshu Mody
Kshitij Verma
Utsav Kaushal
Vaadeendra Kumar Burra
Sibnath Ray
Debashree Bandyopadhyay
author_facet Ali Safdari
Chanda Sai Keshav
Deepanshu Mody
Kshitij Verma
Utsav Kaushal
Vaadeendra Kumar Burra
Sibnath Ray
Debashree Bandyopadhyay
author_sort Ali Safdari
collection DOAJ
description The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred.
format Article
id doaj-art-4113baaf85904047ab2b7213d72a844a
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-4113baaf85904047ab2b7213d72a844a2025-02-09T05:30:36ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031646710.1371/journal.pone.0316467The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.Ali SafdariChanda Sai KeshavDeepanshu ModyKshitij VermaUtsav KaushalVaadeendra Kumar BurraSibnath RayDebashree BandyopadhyayThe unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred.https://doi.org/10.1371/journal.pone.0316467
spellingShingle Ali Safdari
Chanda Sai Keshav
Deepanshu Mody
Kshitij Verma
Utsav Kaushal
Vaadeendra Kumar Burra
Sibnath Ray
Debashree Bandyopadhyay
The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
PLoS ONE
title The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_full The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_fullStr The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_full_unstemmed The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_short The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_sort external validity of machine learning based prediction scores from hematological parameters of covid 19 a study using hospital records from brazil italy and western europe
url https://doi.org/10.1371/journal.pone.0316467
work_keys_str_mv AT alisafdari theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT chandasaikeshav theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT deepanshumody theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT kshitijverma theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT utsavkaushal theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT vaadeendrakumarburra theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT sibnathray theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT debashreebandyopadhyay theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT alisafdari externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT chandasaikeshav externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT deepanshumody externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT kshitijverma externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT utsavkaushal externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT vaadeendrakumarburra externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT sibnathray externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope
AT debashreebandyopadhyay externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope