Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm

Data imbalance in health datasets often leads to decreased performance of classification models, especially in detecting minority classes such as diabetics. This study evaluates the effect of the SMOTE-ENN method on improving the performance of the C4.5 algorithm in the classification of diabetes he...

Full description

Saved in:
Bibliographic Details
Main Authors: Bakti Putra Pamungkas, Muhammad Jauhar Vikri, Ita Aristia Sa'ida
Format: Article
Language:English
Published: LPPM ISB Atma Luhur 2025-05-01
Series:Jurnal Sisfokom
Subjects:
Online Access:https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/2350
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849422580608401408
author Bakti Putra Pamungkas
Muhammad Jauhar Vikri
Ita Aristia Sa'ida
author_facet Bakti Putra Pamungkas
Muhammad Jauhar Vikri
Ita Aristia Sa'ida
author_sort Bakti Putra Pamungkas
collection DOAJ
description Data imbalance in health datasets often leads to decreased performance of classification models, especially in detecting minority classes such as diabetics. This study evaluates the effect of the SMOTE-ENN method on improving the performance of the C4.5 algorithm in the classification of diabetes health indicators. The dataset used is the 2021 Diabetes Binary Health Indicators BRFSS from Kaggle, which consists of 236,378 respondent data with unbalanced class distribution: 85.80% non-diabetic and 14.20% diabetic. The SMOTE method was used to add synthetic data to the minority classes, while ENN was applied to remove data considered noise. After balancing, the C4.5 algorithm was used for classification. Evaluation was conducted using accuracy, precision, recall, and F1-score metrics. The results showed that the application of SMOTE-ENN improved accuracy from 79.49% to 80.33% and precision from 29% to 30%. Although the recall value did not increase, this method proved to be able to improve the overall stability of the prediction, especially in terms of the accuracy of the classification of the positive class. The novelty of this research lies in the specific application of the SMOTE-ENN method on large-scale health datasets with the C4.5 algorithm, which has not been widely explored before. Therefore, further exploration of other balancing techniques and algorithms is needed to obtain more optimal classification results on unbalanced data.
format Article
id doaj-art-d93affabc80d4be6919efb724e0a0837
institution Kabale University
issn 2301-7988
2581-0588
language English
publishDate 2025-05-01
publisher LPPM ISB Atma Luhur
record_format Article
series Jurnal Sisfokom
spelling doaj-art-d93affabc80d4be6919efb724e0a08372025-08-20T03:31:01ZengLPPM ISB Atma LuhurJurnal Sisfokom2301-79882581-05882025-05-0114218318810.32736/sisfokom.v14i2.23502013Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 AlgorithmBakti Putra Pamungkas0Muhammad Jauhar Vikri1Ita Aristia Sa'ida2Department of Informatics Engineering, University of Nahdlatul Ulama Sunan Giri, BojonegoroDepartment of Informatics Engineering, University of Nahdlatul Ulama Sunan Giri, BojonegoroDepartment of Informatics Engineering, University of Nahdlatul Ulama Sunan Giri, BojonegoroData imbalance in health datasets often leads to decreased performance of classification models, especially in detecting minority classes such as diabetics. This study evaluates the effect of the SMOTE-ENN method on improving the performance of the C4.5 algorithm in the classification of diabetes health indicators. The dataset used is the 2021 Diabetes Binary Health Indicators BRFSS from Kaggle, which consists of 236,378 respondent data with unbalanced class distribution: 85.80% non-diabetic and 14.20% diabetic. The SMOTE method was used to add synthetic data to the minority classes, while ENN was applied to remove data considered noise. After balancing, the C4.5 algorithm was used for classification. Evaluation was conducted using accuracy, precision, recall, and F1-score metrics. The results showed that the application of SMOTE-ENN improved accuracy from 79.49% to 80.33% and precision from 29% to 30%. Although the recall value did not increase, this method proved to be able to improve the overall stability of the prediction, especially in terms of the accuracy of the classification of the positive class. The novelty of this research lies in the specific application of the SMOTE-ENN method on large-scale health datasets with the C4.5 algorithm, which has not been widely explored before. Therefore, further exploration of other balancing techniques and algorithms is needed to obtain more optimal classification results on unbalanced data.https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/2350smote-enndata imbalancec4.5diabetesclassification
spellingShingle Bakti Putra Pamungkas
Muhammad Jauhar Vikri
Ita Aristia Sa'ida
Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
Jurnal Sisfokom
smote-enn
data imbalance
c4.5
diabetes
classification
title Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
title_full Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
title_fullStr Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
title_full_unstemmed Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
title_short Application of SMOTE-ENN Method in Data Balancing for Classification of Diabetes Health Indicators with C4.5 Algorithm
title_sort application of smote enn method in data balancing for classification of diabetes health indicators with c4 5 algorithm
topic smote-enn
data imbalance
c4.5
diabetes
classification
url https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/2350
work_keys_str_mv AT baktiputrapamungkas applicationofsmoteennmethodindatabalancingforclassificationofdiabeteshealthindicatorswithc45algorithm
AT muhammadjauharvikri applicationofsmoteennmethodindatabalancingforclassificationofdiabeteshealthindicatorswithc45algorithm
AT itaaristiasaida applicationofsmoteennmethodindatabalancingforclassificationofdiabeteshealthindicatorswithc45algorithm