An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values

Abstract Introduction Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between trainin...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingming Chen, Qihang Qian, Xiang Pan, Tenglong Li
Format: Article
Language:English
Published: BMC 2025-04-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-025-02572-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849311174908182528
author Mingming Chen
Qihang Qian
Xiang Pan
Tenglong Li
author_facet Mingming Chen
Qihang Qian
Xiang Pan
Tenglong Li
author_sort Mingming Chen
collection DOAJ
description Abstract Introduction Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.
format Article
id doaj-art-86ae0f59b08e44ab87426fa1ba3cf986
institution Kabale University
issn 1471-2288
language English
publishDate 2025-04-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj-art-86ae0f59b08e44ab87426fa1ba3cf9862025-08-20T03:53:31ZengBMCBMC Medical Research Methodology1471-22882025-04-0125111210.1186/s12874-025-02572-8An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley ValuesMingming Chen0Qihang Qian1Xiang Pan2Tenglong Li3Academy of Pharmacy, Xi’an Jiaotong-Liverpool UniversitySchool of Computer Science and Technology, Zhejiang University of TechnologySchool of Computer Science and Technology, Zhejiang University of TechnologyAcademy of Pharmacy, Xi’an Jiaotong-Liverpool UniversityAbstract Introduction Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.https://doi.org/10.1186/s12874-025-02572-8Temporality importShapley valuesRandom forestCOVID-19 infection predictionCOVID-19 mortality prediction
spellingShingle Mingming Chen
Qihang Qian
Xiang Pan
Tenglong Li
An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
BMC Medical Research Methodology
Temporality import
Shapley values
Random forest
COVID-19 infection prediction
COVID-19 mortality prediction
title An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
title_full An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
title_fullStr An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
title_full_unstemmed An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
title_short An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
title_sort investigation into the impact of temporality on covid 19 infection and mortality predictions new perspective based on shapley values
topic Temporality import
Shapley values
Random forest
COVID-19 infection prediction
COVID-19 mortality prediction
url https://doi.org/10.1186/s12874-025-02572-8
work_keys_str_mv AT mingmingchen aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT qihangqian aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT xiangpan aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT tenglongli aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT mingmingchen investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT qihangqian investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT xiangpan investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues
AT tenglongli investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues