Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review

Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can degrade model performance, result in misleading evaluation metrics, and complicate validation proces...

Full description

Saved in:
Bibliographic Details
Main Authors: Amadi G. Udu, Marwah T. Salman, Maryam K. Ghalati, Andrea Lecchini-Visintini, David R. Siddle, Hongbiao Dong
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11062634/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850116880130375680
author Amadi G. Udu
Marwah T. Salman
Maryam K. Ghalati
Andrea Lecchini-Visintini
David R. Siddle
Hongbiao Dong
author_facet Amadi G. Udu
Marwah T. Salman
Maryam K. Ghalati
Andrea Lecchini-Visintini
David R. Siddle
Hongbiao Dong
author_sort Amadi G. Udu
collection DOAJ
description Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can degrade model performance, result in misleading evaluation metrics, and complicate validation processes. Two prominent data-augmentation techniques to address class imbalance are the Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GAN). However, both techniques have inherent limitations, motivating the emergence of novel variants designed to overcome these challenges. While previous reviews have typically focused on specific domains, conventional methodologies, or broad strategy overviews, this review presents a unified taxonomy that outlines the causes, types, and implications of class imbalance across diverse ML tasks. It further examines emerging trends in the application of SMOTE and GAN techniques, their limitations, and hybrid adaptations. By categorising imbalance types and analysing models, metrics, datasets, and comparative approaches, this review provides actionable insights and identifies future research directions for practitioners and researchers working to address class imbalance in real-world ML tasks.
format Article
id doaj-art-cfb60900beed406f9df758c752aa91d1
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-cfb60900beed406f9df758c752aa91d12025-08-20T02:36:12ZengIEEEIEEE Access2169-35362025-01-011311383811385310.1109/ACCESS.2025.358453211062634Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A ReviewAmadi G. Udu0https://orcid.org/0000-0001-8944-4940Marwah T. Salman1https://orcid.org/0009-0009-9909-8055Maryam K. Ghalati2Andrea Lecchini-Visintini3https://orcid.org/0000-0002-1654-8877David R. Siddle4https://orcid.org/0000-0002-1125-5610Hongbiao Dong5https://orcid.org/0000-0003-1244-0364School of Engineering, University of Leicester, Leicester, U.K.School of Engineering, University of Leicester, Leicester, U.K.School of Engineering, University of Leicester, Leicester, U.K.School of Electronics and Computer Science, University of Southampton, Southampton, U.K.School of Engineering, University of Leicester, Leicester, U.K.School of Engineering, University of Leicester, Leicester, U.K.Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can degrade model performance, result in misleading evaluation metrics, and complicate validation processes. Two prominent data-augmentation techniques to address class imbalance are the Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GAN). However, both techniques have inherent limitations, motivating the emergence of novel variants designed to overcome these challenges. While previous reviews have typically focused on specific domains, conventional methodologies, or broad strategy overviews, this review presents a unified taxonomy that outlines the causes, types, and implications of class imbalance across diverse ML tasks. It further examines emerging trends in the application of SMOTE and GAN techniques, their limitations, and hybrid adaptations. By categorising imbalance types and analysing models, metrics, datasets, and comparative approaches, this review provides actionable insights and identifies future research directions for practitioners and researchers working to address class imbalance in real-world ML tasks.https://ieeexplore.ieee.org/document/11062634/Class imbalancedata-augmentationgenerative adversarial networksmachine learningSMOTE
spellingShingle Amadi G. Udu
Marwah T. Salman
Maryam K. Ghalati
Andrea Lecchini-Visintini
David R. Siddle
Hongbiao Dong
Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
IEEE Access
Class imbalance
data-augmentation
generative adversarial networks
machine learning
SMOTE
title Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
title_full Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
title_fullStr Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
title_full_unstemmed Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
title_short Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
title_sort emerging smote and gan variants for data augmentation in imbalance machine learning tasks a review
topic Class imbalance
data-augmentation
generative adversarial networks
machine learning
SMOTE
url https://ieeexplore.ieee.org/document/11062634/
work_keys_str_mv AT amadigudu emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview
AT marwahtsalman emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview
AT maryamkghalati emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview
AT andrealecchinivisintini emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview
AT davidrsiddle emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview
AT hongbiaodong emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview