The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.
The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Re...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2025-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0316467 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823864069979897856 |
---|---|
author | Ali Safdari Chanda Sai Keshav Deepanshu Mody Kshitij Verma Utsav Kaushal Vaadeendra Kumar Burra Sibnath Ray Debashree Bandyopadhyay |
author_facet | Ali Safdari Chanda Sai Keshav Deepanshu Mody Kshitij Verma Utsav Kaushal Vaadeendra Kumar Burra Sibnath Ray Debashree Bandyopadhyay |
author_sort | Ali Safdari |
collection | DOAJ |
description | The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred. |
format | Article |
id | doaj-art-4113baaf85904047ab2b7213d72a844a |
institution | Kabale University |
issn | 1932-6203 |
language | English |
publishDate | 2025-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj-art-4113baaf85904047ab2b7213d72a844a2025-02-09T05:30:36ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031646710.1371/journal.pone.0316467The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe.Ali SafdariChanda Sai KeshavDeepanshu ModyKshitij VermaUtsav KaushalVaadeendra Kumar BurraSibnath RayDebashree BandyopadhyayThe unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred.https://doi.org/10.1371/journal.pone.0316467 |
spellingShingle | Ali Safdari Chanda Sai Keshav Deepanshu Mody Kshitij Verma Utsav Kaushal Vaadeendra Kumar Burra Sibnath Ray Debashree Bandyopadhyay The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. PLoS ONE |
title | The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. |
title_full | The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. |
title_fullStr | The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. |
title_full_unstemmed | The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. |
title_short | The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. |
title_sort | external validity of machine learning based prediction scores from hematological parameters of covid 19 a study using hospital records from brazil italy and western europe |
url | https://doi.org/10.1371/journal.pone.0316467 |
work_keys_str_mv | AT alisafdari theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT chandasaikeshav theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT deepanshumody theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT kshitijverma theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT utsavkaushal theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT vaadeendrakumarburra theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT sibnathray theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT debashreebandyopadhyay theexternalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT alisafdari externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT chandasaikeshav externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT deepanshumody externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT kshitijverma externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT utsavkaushal externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT vaadeendrakumarburra externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT sibnathray externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope AT debashreebandyopadhyay externalvalidityofmachinelearningbasedpredictionscoresfromhematologicalparametersofcovid19astudyusinghospitalrecordsfrombrazilitalyandwesterneurope |