An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values
Abstract Introduction Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between trainin...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-04-01
|
| Series: | BMC Medical Research Methodology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12874-025-02572-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849311174908182528 |
|---|---|
| author | Mingming Chen Qihang Qian Xiang Pan Tenglong Li |
| author_facet | Mingming Chen Qihang Qian Xiang Pan Tenglong Li |
| author_sort | Mingming Chen |
| collection | DOAJ |
| description | Abstract Introduction Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years. |
| format | Article |
| id | doaj-art-86ae0f59b08e44ab87426fa1ba3cf986 |
| institution | Kabale University |
| issn | 1471-2288 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Research Methodology |
| spelling | doaj-art-86ae0f59b08e44ab87426fa1ba3cf9862025-08-20T03:53:31ZengBMCBMC Medical Research Methodology1471-22882025-04-0125111210.1186/s12874-025-02572-8An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley ValuesMingming Chen0Qihang Qian1Xiang Pan2Tenglong Li3Academy of Pharmacy, Xi’an Jiaotong-Liverpool UniversitySchool of Computer Science and Technology, Zhejiang University of TechnologySchool of Computer Science and Technology, Zhejiang University of TechnologyAcademy of Pharmacy, Xi’an Jiaotong-Liverpool UniversityAbstract Introduction Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.https://doi.org/10.1186/s12874-025-02572-8Temporality importShapley valuesRandom forestCOVID-19 infection predictionCOVID-19 mortality prediction |
| spellingShingle | Mingming Chen Qihang Qian Xiang Pan Tenglong Li An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values BMC Medical Research Methodology Temporality import Shapley values Random forest COVID-19 infection prediction COVID-19 mortality prediction |
| title | An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values |
| title_full | An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values |
| title_fullStr | An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values |
| title_full_unstemmed | An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values |
| title_short | An investigation into the impact of temporality on COVID-19 infection and mortality predictions: new perspective based on Shapley Values |
| title_sort | investigation into the impact of temporality on covid 19 infection and mortality predictions new perspective based on shapley values |
| topic | Temporality import Shapley values Random forest COVID-19 infection prediction COVID-19 mortality prediction |
| url | https://doi.org/10.1186/s12874-025-02572-8 |
| work_keys_str_mv | AT mingmingchen aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT qihangqian aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT xiangpan aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT tenglongli aninvestigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT mingmingchen investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT qihangqian investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT xiangpan investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues AT tenglongli investigationintotheimpactoftemporalityoncovid19infectionandmortalitypredictionsnewperspectivebasedonshapleyvalues |