A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan
Kawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasi...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Bioengineering |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2306-5354/12/7/742 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849407007813009408 |
|---|---|
| author | Heng-Chih Huang Chuan-Sheng Hung Chun-Hung Richard Lin Yi-Zhen Shie Cheng-Han Yu Ting-Hsin Huang |
| author_facet | Heng-Chih Huang Chuan-Sheng Hung Chun-Hung Richard Lin Yi-Zhen Shie Cheng-Han Yu Ting-Hsin Huang |
| author_sort | Heng-Chih Huang |
| collection | DOAJ |
| description | Kawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasing the risk of oversight. Leveraging routine laboratory tests with AI offers a promising strategy for enhancing early detection. However, due to the extremely low prevalence of KD, conventional models often struggle with severe class imbalance, limiting their ability to achieve both high sensitivity and specificity in practice. To address this issue, we propose a multi-stage AI-based predictive framework that incorporates clustering-based undersampling, data augmentation, and stacking ensemble learning. The model was trained and internally tested on clinical blood and urine test data from Chang Gung Memorial Hospital (CGMH, n = 74,641; 2010–2019), and externally validated using an independent dataset from Kaohsiung Medical University Hospital (KMUH, n = 1582; 2012–2020), thereby supporting cross-institutional generalizability. At a fixed recall rate of 95%, the model achieved a specificity of 97.5% and an F1-score of 53.6% on the CGMH test set, and a specificity of 74.7% with an F1-score of 23.4% on the KMUH validation set. These results underscore the model’s ability to maintain high specificity even under sensitivity-focused constraints, while still delivering clinically meaningful predictive performance. This balance of sensitivity and specificity highlights the framework’s practical utility for real-world KD screening. |
| format | Article |
| id | doaj-art-b8993497cb7c451a99fb3be4c107d10e |
| institution | Kabale University |
| issn | 2306-5354 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Bioengineering |
| spelling | doaj-art-b8993497cb7c451a99fb3be4c107d10e2025-08-20T03:36:13ZengMDPI AGBioengineering2306-53542025-07-0112774210.3390/bioengineering12070742A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in TaiwanHeng-Chih Huang0Chuan-Sheng Hung1Chun-Hung Richard Lin2Yi-Zhen Shie3Cheng-Han Yu4Ting-Hsin Huang5Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanDepartment of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, TaiwanKawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasing the risk of oversight. Leveraging routine laboratory tests with AI offers a promising strategy for enhancing early detection. However, due to the extremely low prevalence of KD, conventional models often struggle with severe class imbalance, limiting their ability to achieve both high sensitivity and specificity in practice. To address this issue, we propose a multi-stage AI-based predictive framework that incorporates clustering-based undersampling, data augmentation, and stacking ensemble learning. The model was trained and internally tested on clinical blood and urine test data from Chang Gung Memorial Hospital (CGMH, n = 74,641; 2010–2019), and externally validated using an independent dataset from Kaohsiung Medical University Hospital (KMUH, n = 1582; 2012–2020), thereby supporting cross-institutional generalizability. At a fixed recall rate of 95%, the model achieved a specificity of 97.5% and an F1-score of 53.6% on the CGMH test set, and a specificity of 74.7% with an F1-score of 23.4% on the KMUH validation set. These results underscore the model’s ability to maintain high specificity even under sensitivity-focused constraints, while still delivering clinically meaningful predictive performance. This balance of sensitivity and specificity highlights the framework’s practical utility for real-world KD screening.https://www.mdpi.com/2306-5354/12/7/742Kawasaki diseaseclass imbalanceclusteringensemble learningdata augmentation |
| spellingShingle | Heng-Chih Huang Chuan-Sheng Hung Chun-Hung Richard Lin Yi-Zhen Shie Cheng-Han Yu Ting-Hsin Huang A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan Bioengineering Kawasaki disease class imbalance clustering ensemble learning data augmentation |
| title | A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan |
| title_full | A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan |
| title_fullStr | A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan |
| title_full_unstemmed | A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan |
| title_short | A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan |
| title_sort | multi stage framework for kawasaki disease prediction using clustering based undersampling and synthetic data augmentation cross institutional validation with dual center clinical data in taiwan |
| topic | Kawasaki disease class imbalance clustering ensemble learning data augmentation |
| url | https://www.mdpi.com/2306-5354/12/7/742 |
| work_keys_str_mv | AT hengchihhuang amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT chuanshenghung amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT chunhungrichardlin amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT yizhenshie amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT chenghanyu amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT tinghsinhuang amultistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT hengchihhuang multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT chuanshenghung multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT chunhungrichardlin multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT yizhenshie multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT chenghanyu multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan AT tinghsinhuang multistageframeworkforkawasakidiseasepredictionusingclusteringbasedundersamplingandsyntheticdataaugmentationcrossinstitutionalvalidationwithdualcenterclinicaldataintaiwan |