Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods

Abstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a l...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuting Shi, Haowen Lin, Leiming Jiang, Zhiqi Zeng, ChuiXu Lin, Pei Li, Yinghua Li, Zifeng Yang
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:npj Climate and Atmospheric Science
Online Access:https://doi.org/10.1038/s41612-025-00894-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a limited number of respiratory pathogens. This study aimed to develop a nationwide respiratory virus infection risk prediction model through machine learning approach. We utilized the CRFC algorithm, a random forest-based method for multi-label classification, to predict the presence of various respiratory viruses. The model integrated binary classification outcomes for each virus category and incorporated air quality and meteorological data to enhance its accuracy. The data was collected from 31 regions in China between 2016 and 2021, encompassing pathogen detection, air quality indices, and meteorological measurements. The model’s performance was evaluated using ROC curves, AUC scores, and precision-recall curves. Our model demonstrated robust performance across various metrics, with an average overall accuracy of 0.76, macro sensitivity of 0.75, macro precision of 0.77, and an average AUC score of 0.9. The SHAP framework was employed to interpret the model’s predictions, revealing significant contributions from parameters such as age, NO2 levels, and meteorological conditions. Our model provides a reliable tool for predicting respiratory virus risks, with a comprehensive integration of environmental and clinical data. The model’s performance metrics indicate its potential utility in clinical decision-making and public health planning. Future work will focus on refining the model and expanding its applicability to diverse populations and settings.
ISSN:2397-3722