A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan

Kawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasi...

Full description

Saved in:
Bibliographic Details
Main Authors: Heng-Chih Huang, Chuan-Sheng Hung, Chun-Hung Richard Lin, Yi-Zhen Shie, Cheng-Han Yu, Ting-Hsin Huang
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/12/7/742
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849407007813009408
author Heng-Chih Huang
Chuan-Sheng Hung
Chun-Hung Richard Lin
Yi-Zhen Shie
Cheng-Han Yu
Ting-Hsin Huang
author_facet Heng-Chih Huang
Chuan-Sheng Hung
Chun-Hung Richard Lin
Yi-Zhen Shie
Cheng-Han Yu
Ting-Hsin Huang
author_sort Heng-Chih Huang
collection DOAJ
description Kawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasing the risk of oversight. Leveraging routine laboratory tests with AI offers a promising strategy for enhancing early detection. However, due to the extremely low prevalence of KD, conventional models often struggle with severe class imbalance, limiting their ability to achieve both high sensitivity and specificity in practice. To address this issue, we propose a multi-stage AI-based predictive framework that incorporates clustering-based undersampling, data augmentation, and stacking ensemble learning. The model was trained and internally tested on clinical blood and urine test data from Chang Gung Memorial Hospital (CGMH, n = 74,641; 2010–2019), and externally validated using an independent dataset from Kaohsiung Medical University Hospital (KMUH, n = 1582; 2012–2020), thereby supporting cross-institutional generalizability. At a fixed recall rate of 95%, the model achieved a specificity of 97.5% and an F1-score of 53.6% on the CGMH test set, and a specificity of 74.7% with an F1-score of 23.4% on the KMUH validation set. These results underscore the model’s ability to maintain high specificity even under sensitivity-focused constraints, while still delivering clinically meaningful predictive performance. This balance of sensitivity and specificity highlights the framework’s practical utility for real-world KD screening.
format Article
id doaj-art-b8993497cb7c451a99fb3be4c107d10e
institution Kabale University
issn 2306-5354
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj-art-b8993497cb7c451a99fb3be4c107d10e2025-08-20T03:36:13ZengMDPI AGBioengineering2306-53542025-07-0112774210.3390/bioengineering12070742A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in TaiwanHeng-Chih Huang0Chuan-Sheng Hung1Chun-Hung Richard Lin2Yi-Zhen Shie3Cheng-Han Yu4Ting-Hsin Huang5Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanKawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasing the risk of oversight. Leveraging routine laboratory tests with AI offers a promising strategy for enhancing early detection. However, due to the extremely low prevalence of KD, conventional models often struggle with severe class imbalance, limiting their ability to achieve both high sensitivity and specificity in practice. To address this issue, we propose a multi-stage AI-based predictive framework that incorporates clustering-based undersampling, data augmentation, and stacking ensemble learning. The model was trained and internally tested on clinical blood and urine test data from Chang Gung Memorial Hospital (CGMH, n = 74,641; 2010–2019), and externally validated using an independent dataset from Kaohsiung Medical University Hospital (KMUH, n = 1582; 2012–2020), thereby supporting cross-institutional generalizability. At a fixed recall rate of 95%, the model achieved a specificity of 97.5% and an F1-score of 53.6% on the CGMH test set, and a specificity of 74.7% with an F1-score of 23.4% on the KMUH validation set. These results underscore the model’s ability to maintain high specificity even under sensitivity-focused constraints, while still delivering clinically meaningful predictive performance. This balance of sensitivity and specificity highlights the framework’s practical utility for real-world KD screening.https://www.mdpi.com/2306-5354/12/7/742Kawasaki diseaseclass imbalanceclusteringensemble learningdata augmentation
spellingShingle Heng-Chih Huang
Chuan-Sheng Hung
Chun-Hung Richard Lin
Yi-Zhen Shie
Cheng-Han Yu
Ting-Hsin Huang
A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
Bioengineering
Kawasaki disease
class imbalance
clustering
ensemble learning
data augmentation
title A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
title_full A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
title_fullStr A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
title_full_unstemmed A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
title_short A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
title_sort multi stage framework for kawasaki disease prediction using clustering based undersampling and synthetic data augmentation cross institutional validation with dual center clinical data in taiwan
topic Kawasaki disease
class imbalance
clustering
ensemble learning
data augmentation
url https://www.mdpi.com/2306-5354/12/7/742
work_keys_str_mv AT hengchihhuang amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT chuanshenghung amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT chunhungrichardlin amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT yizhenshie amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT chenghanyu amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT tinghsinhuang amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT hengchihhuang multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT chuanshenghung multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT chunhungrichardlin multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT yizhenshie multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT chenghanyu multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan
AT tinghsinhuang multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan