SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models

Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenar...

Full description

Saved in:
Bibliographic Details
Main Authors: Gazi Husain, Daniel Nasef, Rejath Jose, Jonathan Mayer, Molly Bekbolatova, Timothy Devine, Milan Toma
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/1/37
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589444844093440
author Gazi Husain
Daniel Nasef
Rejath Jose
Jonathan Mayer
Molly Bekbolatova
Timothy Devine
Milan Toma
author_facet Gazi Husain
Daniel Nasef
Rejath Jose
Jonathan Mayer
Molly Bekbolatova
Timothy Devine
Milan Toma
author_sort Gazi Husain
collection DOAJ
description Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications.
format Article
id doaj-art-fbfaef81588b437fae555cf98dde05bd
institution Kabale University
issn 1999-4893
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj-art-fbfaef81588b437fae555cf98dde05bd2025-01-24T13:17:34ZengMDPI AGAlgorithms1999-48932025-01-011813710.3390/a18010037SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression ModelsGazi Husain0Daniel Nasef1Rejath Jose2Jonathan Mayer3Molly Bekbolatova4Timothy Devine5Milan Toma6Department of Anatomy, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USAThe Ferrara Center for Patient Safety and Clinical Simulation, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USADepartment of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USAClass imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications.https://www.mdpi.com/1999-4893/18/1/37class imbalanceSMOTESMOTEENNoversamplingmachine learning
spellingShingle Gazi Husain
Daniel Nasef
Rejath Jose
Jonathan Mayer
Molly Bekbolatova
Timothy Devine
Milan Toma
SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
Algorithms
class imbalance
SMOTE
SMOTEENN
oversampling
machine learning
title SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
title_full SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
title_fullStr SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
title_full_unstemmed SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
title_short SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
title_sort smote vs smoteenn a study on the performance of resampling algorithms for addressing class imbalance in regression models
topic class imbalance
SMOTE
SMOTEENN
oversampling
machine learning
url https://www.mdpi.com/1999-4893/18/1/37
work_keys_str_mv AT gazihusain smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels
AT danielnasef smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels
AT rejathjose smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels
AT jonathanmayer smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels
AT mollybekbolatova smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels
AT timothydevine smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels
AT milantoma smotevssmoteennastudyontheperformanceofresamplingalgorithmsforaddressingclassimbalanceinregressionmodels