Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms

Abstract Background This study aimed to develop predictive models with robust generalization capabilities for assessing the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms. Methods Data were collected from two centers and categorized into development and va...

Full description

Saved in:
Bibliographic Details
Main Authors: Haobo Kong, Yong Li, Ya Shen, Jingjing Pan, Min Liang, Zhi Geng, Yanbei Zhang
Format: Article
Language:English
Published: BMC 2024-12-01
Series:European Journal of Medical Research
Subjects:
Online Access:https://doi.org/10.1186/s40001-024-02218-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850035297128022016
author Haobo Kong
Yong Li
Ya Shen
Jingjing Pan
Min Liang
Zhi Geng
Yanbei Zhang
author_facet Haobo Kong
Yong Li
Ya Shen
Jingjing Pan
Min Liang
Zhi Geng
Yanbei Zhang
author_sort Haobo Kong
collection DOAJ
description Abstract Background This study aimed to develop predictive models with robust generalization capabilities for assessing the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms. Methods Data were collected from two centers and categorized into development and validation cohorts. Using the development cohort, candidate variables were selected via the Recursive Feature Elimination (RFE) method. Five machine learning algorithms, logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and support vector machine (SVM), were utilized to construct the predictive models. Model performance was evaluated through nested cross-validation and area under the curve (AUC) metrics, supplemented by interpretations using Shapley Additive explanations (SHAP) and line charts of AUC values. Models were subjected to external validation using an independent validation group, facilitating the early identification and management of pulmonary embolism risks in tuberculosis patients. Results Data from 694 patients were used for model development, and 236 patients from the validation group met the enrollment criteria. The optimal subset of variables identified included D-dimer, smoking status, dyspnea, age, sex, diabetes, platelet count, cough, fibrinogen, hemoglobin, hemoptysis, hypertension, chronic obstructive pulmonary disease (COPD), and chest pain. The RF model outperformed others, achieving an AUC of 0.839 (95% CI 0.780–0.899) and maintaining the highest average performance in external fivefold cross-validation (AUC: 0.906 ± 0.041). Conclusions The RF model demonstrates high and consistent effectiveness in predicting pulmonary embolism risk in tuberculosis patients.
format Article
id doaj-art-2c84295ecac04b1daf5f50e57cba7ffd
institution DOAJ
issn 2047-783X
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series European Journal of Medical Research
spelling doaj-art-2c84295ecac04b1daf5f50e57cba7ffd2025-08-20T02:57:32ZengBMCEuropean Journal of Medical Research2047-783X2024-12-012911910.1186/s40001-024-02218-3Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithmsHaobo Kong0Yong Li1Ya Shen2Jingjing Pan3Min Liang4Zhi Geng5Yanbei Zhang6Department of Geriatric Respiratory and Critical Care, Anhui Geriatric Institute, The First Affiliated Hospital of Anhui Medical UniversityDepartment of Geriatric Respiratory and Critical Care, Anhui Geriatric Institute, The First Affiliated Hospital of Anhui Medical UniversityDepartment of Respiratory and Critical Care Medicine, Fuyang Infectious Disease Clinical College of Anhui Medical UniversityDepartment of Respiratory Intensive Care Unit, Anhui Medical University Clinical College of Chest & Anhui Chest HospitalDepartment of Tuberculosis, Anhui Medical University Clinical College of Chest & Anhui Chest HospitalDepartment of Neurology, The First Affiliated Hospital of Anhui Medical UniversityDepartment of Geriatric Respiratory and Critical Care, Anhui Geriatric Institute, The First Affiliated Hospital of Anhui Medical UniversityAbstract Background This study aimed to develop predictive models with robust generalization capabilities for assessing the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms. Methods Data were collected from two centers and categorized into development and validation cohorts. Using the development cohort, candidate variables were selected via the Recursive Feature Elimination (RFE) method. Five machine learning algorithms, logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and support vector machine (SVM), were utilized to construct the predictive models. Model performance was evaluated through nested cross-validation and area under the curve (AUC) metrics, supplemented by interpretations using Shapley Additive explanations (SHAP) and line charts of AUC values. Models were subjected to external validation using an independent validation group, facilitating the early identification and management of pulmonary embolism risks in tuberculosis patients. Results Data from 694 patients were used for model development, and 236 patients from the validation group met the enrollment criteria. The optimal subset of variables identified included D-dimer, smoking status, dyspnea, age, sex, diabetes, platelet count, cough, fibrinogen, hemoglobin, hemoptysis, hypertension, chronic obstructive pulmonary disease (COPD), and chest pain. The RF model outperformed others, achieving an AUC of 0.839 (95% CI 0.780–0.899) and maintaining the highest average performance in external fivefold cross-validation (AUC: 0.906 ± 0.041). Conclusions The RF model demonstrates high and consistent effectiveness in predicting pulmonary embolism risk in tuberculosis patients.https://doi.org/10.1186/s40001-024-02218-3Machine learningPulmonary embolismPulmonary tuberculosisRisk prediction
spellingShingle Haobo Kong
Yong Li
Ya Shen
Jingjing Pan
Min Liang
Zhi Geng
Yanbei Zhang
Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
European Journal of Medical Research
Machine learning
Pulmonary embolism
Pulmonary tuberculosis
Risk prediction
title Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
title_full Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
title_fullStr Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
title_full_unstemmed Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
title_short Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
title_sort predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms
topic Machine learning
Pulmonary embolism
Pulmonary tuberculosis
Risk prediction
url https://doi.org/10.1186/s40001-024-02218-3
work_keys_str_mv AT haobokong predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms
AT yongli predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms
AT yashen predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms
AT jingjingpan predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms
AT minliang predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms
AT zhigeng predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms
AT yanbeizhang predictingtheriskofpulmonaryembolisminpatientswithtuberculosisusingmachinelearningalgorithms