Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases

Imbalanced data happens when the distribution of classes is not equal between positive and negative classes. In healthcare, the majority class typically consists of healthy patient data, while the minority class contains sick patient data. This condition can cause the minority class prediction to be...

Full description

Saved in:

Bibliographic Details
Main Authors:	Uung Ungkawa, Muhammad Avilla Rafi
Format:	Article
Language:	English
Published:	Department of Informatics, UIN Sunan Gunung Djati Bandung 2024-06-01
Series:	JOIN: Jurnal Online Informatika
Subjects:	adasyn imbalanced data machine learning pca-kmeans stroke
Online Access:	https://join.if.uinsgd.ac.id/index.php/join/article/view/1293
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Imbalanced data happens when the distribution of classes is not equal between positive and negative classes. In healthcare, the majority class typically consists of healthy patient data, while the minority class contains sick patient data. This condition can cause the minority class prediction to be wrong because the model tends to predict the majority class. In this study, we use a deep neural network algorithm with focal loss that can deal with class imbalance during training. To balance the data, we use the PCA-KMeans combination model to shrink the dataset and the ADASYN model to give the minority class more samples than it needs. In this study, the research problem is how well the two techniques can improve model performance, especially in minority case classification. The mild model is the best without data balancing, resulting in an accuracy value of 84%. The class 0 F1-score has a value of 86%, whereas the class 1 F1-score has a value of 82%. The moderate model is the best model in the case study of PCA-KMeans balancing data, resulting in an accuracy value of 89%; the class 0 F1-score is 91%; and the class 1 F1-score is 85%. The extreme model is the best model in the ADASYN data balancing case study, resulting in an accuracy value of 95%; the value in class 0 gets a F1-score of 96%, while the value in class 1 gets a F1-score of 96%. Of the three test models, the best model is obtained using ADASYN extreme data balancing with an accuracy value of 95%, the value in class 0 with a F1- score of 93%.
ISSN:	2528-1682 2527-9165

Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases

Similar Items