The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.

The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Re...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ali Safdari, Chanda Sai Keshav, Deepanshu Mody, Kshitij Verma, Utsav Kaushal, Vaadeendra Kumar Burra, Sibnath Ray, Debashree Bandyopadhyay
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0316467
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823864069979897856
author	Ali Safdari Chanda Sai Keshav Deepanshu Mody Kshitij Verma Utsav Kaushal Vaadeendra Kumar Burra Sibnath Ray Debashree Bandyopadhyay
author_facet	Ali Safdari Chanda Sai Keshav Deepanshu Mody Kshitij Verma Utsav Kaushal Vaadeendra Kumar Burra Sibnath Ray Debashree Bandyopadhyay
author_sort	Ali Safdari
collection	DOAJ
description	The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred.
format	Article
id	doaj-art-4113baaf85904047ab2b7213d72a844a
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-4113baaf85904047ab2b7213d72a844a2025-02-09T05:30:36ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031646710.1371/journal.pone.0316467The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.Ali SafdariChanda Sai KeshavDeepanshu ModyKshitij VermaUtsav KaushalVaadeendra Kumar BurraSibnath RayDebashree BandyopadhyayThe unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred.https://doi.org/10.1371/journal.pone.0316467
spellingShingle	Ali Safdari Chanda Sai Keshav Deepanshu Mody Kshitij Verma Utsav Kaushal Vaadeendra Kumar Burra Sibnath Ray Debashree Bandyopadhyay The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. PLoS ONE
title	The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_full	The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_fullStr	The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_full_unstemmed	The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_short	The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
title_sort	external validity of machine learning based prediction scores from hematological parameters of covid 19 a study using hospital records from brazil italy and western europe
url	https://doi.org/10.1371/journal.pone.0316467
work_keys_str_mv	AT alisafdari theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT chandasaikeshav theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT deepanshumody theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT kshitijverma theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT utsavkaushal theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT vaadeendrakumarburra theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT sibnathray theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT debashreebandyopadhyay theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT alisafdari externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT chandasaikeshav externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT deepanshumody externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT kshitijverma externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT utsavkaushal externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT vaadeendrakumarburra externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT sibnathray externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT debashreebandyopadhyay externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope

The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.

Similar Items