Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.

Kawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance furt...

Full description

Saved in:
Bibliographic Details
Main Authors: Chuan-Sheng Hung, Chun-Hung Richard Lin, Jain-Shing Liu, Shi-Huang Chen, Tsung-Chi Hung, Chih-Min Tsai
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0314995
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555568456630272
author Chuan-Sheng Hung
Chun-Hung Richard Lin
Jain-Shing Liu
Shi-Huang Chen
Tsung-Chi Hung
Chih-Min Tsai
author_facet Chuan-Sheng Hung
Chun-Hung Richard Lin
Jain-Shing Liu
Shi-Huang Chen
Tsung-Chi Hung
Chih-Min Tsai
author_sort Chuan-Sheng Hung
collection DOAJ
description Kawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance further complicates accurate prediction when using traditional machine learning and statistical methods. This paper introduces two advanced approaches to address these challenges, enhancing prediction accuracy and generalizability. The first approach proposes a stacking model termed the Disease Classifier (DC), specifically designed to recognize minority class samples within imbalanced datasets, thereby mitigating the bias commonly observed in traditional models toward the majority class. Secondly, we introduce a combined model, the Disease Classifier with CTGAN (CTGAN-DC), which integrates DC with Conditional Tabular Generative Adversarial Network (CTGAN) technology to improve data balance and predictive performance further. Utilizing CTGAN-based oversampling techniques, this model retains the original data characteristics of KD while expanding data diversity. This effectively balances positive and negative KD samples, significantly reducing model bias toward the majority class and enhancing both predictive accuracy and generalizability. Experimental evaluations indicate substantial performance gains, with the DC and CTGAN-DC models achieving notably higher predictive accuracy than individual machine learning models. Specifically, the DC model achieves sensitivity and specificity rates of 95%, while the CTGAN-DC model achieves 95% sensitivity and 97% specificity, demonstrating superior recognition capability. Furthermore, both models exhibit strong generalizability across diverse KD datasets, particularly the CTGAN-DC model, which surpasses the JAMA model with a 3% increase in sensitivity and a 95% improvement in generalization sensitivity and specificity, effectively resolving the model collapse issue observed in the JAMA model. In sum, the proposed DC and CTGAN-DC architectures demonstrate robust generalizability across multiple KD datasets from various healthcare institutions and significantly outperform other models, including XGBoost. These findings lay a solid foundation for advancing disease prediction in the context of imbalanced medical data.
format Article
id doaj-art-c35afeb6e366475695938fc86b080ebf
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-c35afeb6e366475695938fc86b080ebf2025-01-08T05:32:17ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e031499510.1371/journal.pone.0314995Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.Chuan-Sheng HungChun-Hung Richard LinJain-Shing LiuShi-Huang ChenTsung-Chi HungChih-Min TsaiKawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance further complicates accurate prediction when using traditional machine learning and statistical methods. This paper introduces two advanced approaches to address these challenges, enhancing prediction accuracy and generalizability. The first approach proposes a stacking model termed the Disease Classifier (DC), specifically designed to recognize minority class samples within imbalanced datasets, thereby mitigating the bias commonly observed in traditional models toward the majority class. Secondly, we introduce a combined model, the Disease Classifier with CTGAN (CTGAN-DC), which integrates DC with Conditional Tabular Generative Adversarial Network (CTGAN) technology to improve data balance and predictive performance further. Utilizing CTGAN-based oversampling techniques, this model retains the original data characteristics of KD while expanding data diversity. This effectively balances positive and negative KD samples, significantly reducing model bias toward the majority class and enhancing both predictive accuracy and generalizability. Experimental evaluations indicate substantial performance gains, with the DC and CTGAN-DC models achieving notably higher predictive accuracy than individual machine learning models. Specifically, the DC model achieves sensitivity and specificity rates of 95%, while the CTGAN-DC model achieves 95% sensitivity and 97% specificity, demonstrating superior recognition capability. Furthermore, both models exhibit strong generalizability across diverse KD datasets, particularly the CTGAN-DC model, which surpasses the JAMA model with a 3% increase in sensitivity and a 95% improvement in generalization sensitivity and specificity, effectively resolving the model collapse issue observed in the JAMA model. In sum, the proposed DC and CTGAN-DC architectures demonstrate robust generalizability across multiple KD datasets from various healthcare institutions and significantly outperform other models, including XGBoost. These findings lay a solid foundation for advancing disease prediction in the context of imbalanced medical data.https://doi.org/10.1371/journal.pone.0314995
spellingShingle Chuan-Sheng Hung
Chun-Hung Richard Lin
Jain-Shing Liu
Shi-Huang Chen
Tsung-Chi Hung
Chih-Min Tsai
Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.
PLoS ONE
title Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.
title_full Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.
title_fullStr Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.
title_full_unstemmed Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.
title_short Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.
title_sort enhancing generalization in a kawasaki disease prediction model using data augmentation cross validation of patients from two major hospitals in taiwan
url https://doi.org/10.1371/journal.pone.0314995
work_keys_str_mv AT chuanshenghung enhancinggeneralizationinakawasakidiseasepredictionmodelusingdataaugmentationcrossvalidationofpatientsfromtwomajorhospitalsintaiwan
AT chunhungrichardlin enhancinggeneralizationinakawasakidiseasepredictionmodelusingdataaugmentationcrossvalidationofpatientsfromtwomajorhospitalsintaiwan
AT jainshingliu enhancinggeneralizationinakawasakidiseasepredictionmodelusingdataaugmentationcrossvalidationofpatientsfromtwomajorhospitalsintaiwan
AT shihuangchen enhancinggeneralizationinakawasakidiseasepredictionmodelusingdataaugmentationcrossvalidationofpatientsfromtwomajorhospitalsintaiwan
AT tsungchihung enhancinggeneralizationinakawasakidiseasepredictionmodelusingdataaugmentationcrossvalidationofpatientsfromtwomajorhospitalsintaiwan
AT chihmintsai enhancinggeneralizationinakawasakidiseasepredictionmodelusingdataaugmentationcrossvalidationofpatientsfromtwomajorhospitalsintaiwan