A machine learning-based model for predicting paroxysmal and persistent atrial fibrillation based on EHR

Abstract Background There is no effective way to accurately predict paroxysmal and persistent atrial fibrillation (AF) subtypes unless electrocardiogram (ECG) observation is obtained. We aim to develop a predictive model using a machine learning algorithm for identification of paroxysmal and persist...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuqi Zhang, Sijin Li, Peibiao Mai, Yanqi Yang, Niansang Luo, Chao Tong, Kuan Zeng, Kun Zhang
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-02880-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background There is no effective way to accurately predict paroxysmal and persistent atrial fibrillation (AF) subtypes unless electrocardiogram (ECG) observation is obtained. We aim to develop a predictive model using a machine learning algorithm for identification of paroxysmal and persistent AF, and investigate the influencing factors. Methods We collected demographic data, medication use, serological indicators, and baseline cardiac ultrasound data of all included subjects, totaling 50 variables. The diagnosis of AF subtypes is confirmed by ECG observation for at least more than 7 days. Variable selection was performed by spearman correlation analysis, recursive feature elimination, and least absolute shrinkage and selection operator regression. We built a prediction model for AF using three machine learning methods. Finally, the significance of each variable was analyzed by Shapley additive explanations method. Results After screening, we found the optimal variable set consisting of 10 variables. The model we built achieved good predictive performance (AUC = 0.870, 95%CI 0.858 to 0.882), and had specificity of 0.851 (95%CI 0.844 to 0.858) and sensitivity of 0.716 (95%CI 0.676 to 0.755). Good predictive performance was stably achieved in different age subgroups and different gender subgroups. LA and NT-proBNP were the two most important variables for predicting paroxysmal and persistent AF in all models, except for the female subgroup aged less than 60 years. Conclusions Our model makes it possible to predict paroxysmal and persistent AF based on baseline data at admission. Early and individualized intervention strategies based on our model may help to improve clinical outcomes in AF patients.
ISSN:1472-6947