Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases
Imbalanced data happens when the distribution of classes is not equal between positive and negative classes. In healthcare, the majority class typically consists of healthy patient data, while the minority class contains sick patient data. This condition can cause the minority class prediction to be...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Department of Informatics, UIN Sunan Gunung Djati Bandung
2024-06-01
|
| Series: | JOIN: Jurnal Online Informatika |
| Subjects: | |
| Online Access: | https://join.if.uinsgd.ac.id/index.php/join/article/view/1293 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Imbalanced data happens when the distribution of classes is not equal between positive and negative classes. In healthcare, the majority class typically consists of healthy patient data, while the minority class contains sick patient data. This condition can cause the minority class prediction to be wrong because the model tends to predict the majority class. In this study, we use a deep neural network algorithm with focal loss that can deal with class imbalance during training. To balance the data, we use the PCA-KMeans combination model to shrink the dataset and the ADASYN model to give the minority class more samples than it needs. In this study, the research problem is how well the two techniques can improve model performance, especially in minority case classification. The mild model is the best without data balancing, resulting in an accuracy value of 84%. The class 0 F1-score has a value of 86%, whereas the class 1 F1-score has a value of 82%. The moderate model is the best model in the case study of PCA-KMeans balancing data, resulting in an accuracy value of 89%; the class 0 F1-score is 91%; and the class 1 F1-score is 85%. The extreme model is the best model in the ADASYN data balancing case study, resulting in an accuracy value of 95%; the value in class 0 gets a F1-score of 96%, while the value in class 1 gets a F1-score of 96%. Of the three test models, the best model is obtained using ADASYN extreme data balancing with an accuracy value of 95%, the value in class 0 with a F1- score of 93%. |
|---|---|
| ISSN: | 2528-1682 2527-9165 |