SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenar...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Algorithms |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-4893/18/1/37 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589444844093440 |
---|---|
author | Gazi Husain Daniel Nasef Rejath Jose Jonathan Mayer Molly Bekbolatova Timothy Devine Milan Toma |
author_facet | Gazi Husain Daniel Nasef Rejath Jose Jonathan Mayer Molly Bekbolatova Timothy Devine Milan Toma |
author_sort | Gazi Husain |
collection | DOAJ |
description | Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications. |
format | Article |
id | doaj-art-fbfaef81588b437fae555cf98dde05bd |
institution | Kabale University |
issn | 1999-4893 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Algorithms |
spelling | doaj-art-fbfaef81588b437fae555cf98dde05bd2025-01-24T13:17:34ZengMDPI AGAlgorithms1999-48932025-01-011813710.3390/a18010037SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression ModelsGazi Husain0Daniel Nasef1Rejath Jose2Jonathan Mayer3Molly Bekbolatova4Timothy Devine5Milan Toma6Department of Anatomy, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USAThe Ferrara Center for Patient Safety and Clinical Simulation, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USAClass imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications.https://www.mdpi.com/1999-4893/18/1/37class imbalanceSMOTESMOTEENNoversamplingmachine learning |
spellingShingle | Gazi Husain Daniel Nasef Rejath Jose Jonathan Mayer Molly Bekbolatova Timothy Devine Milan Toma SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models Algorithms class imbalance SMOTE SMOTEENN oversampling machine learning |
title | SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models |
title_full | SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models |
title_fullStr | SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models |
title_full_unstemmed | SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models |
title_short | SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models |
title_sort | smote vs smoteenn a study on the performance of resampling algorithms for addressing class imbalance in regression models |
topic | class imbalance SMOTE SMOTEENN oversampling machine learning |
url | https://www.mdpi.com/1999-4893/18/1/37 |
work_keys_str_mv | AT gazihusain smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels AT danielnasef smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels AT rejathjose smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels AT jonathanmayer smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels AT mollybekbolatova smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels AT timothydevine smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels AT milantoma smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels |