Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review
Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can degrade model performance, result in misleading evaluation metrics, and complicate validation proces...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11062634/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850116880130375680 |
|---|---|
| author | Amadi G. Udu Marwah T. Salman Maryam K. Ghalati Andrea Lecchini-Visintini David R. Siddle Hongbiao Dong |
| author_facet | Amadi G. Udu Marwah T. Salman Maryam K. Ghalati Andrea Lecchini-Visintini David R. Siddle Hongbiao Dong |
| author_sort | Amadi G. Udu |
| collection | DOAJ |
| description | Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can degrade model performance, result in misleading evaluation metrics, and complicate validation processes. Two prominent data-augmentation techniques to address class imbalance are the Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GAN). However, both techniques have inherent limitations, motivating the emergence of novel variants designed to overcome these challenges. While previous reviews have typically focused on specific domains, conventional methodologies, or broad strategy overviews, this review presents a unified taxonomy that outlines the causes, types, and implications of class imbalance across diverse ML tasks. It further examines emerging trends in the application of SMOTE and GAN techniques, their limitations, and hybrid adaptations. By categorising imbalance types and analysing models, metrics, datasets, and comparative approaches, this review provides actionable insights and identifies future research directions for practitioners and researchers working to address class imbalance in real-world ML tasks. |
| format | Article |
| id | doaj-art-cfb60900beed406f9df758c752aa91d1 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-cfb60900beed406f9df758c752aa91d12025-08-20T02:36:12ZengIEEEIEEE Access2169-35362025-01-011311383811385310.1109/ACCESS.2025.358453211062634Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A ReviewAmadi G. Udu0https://orcid.org/0000-0001-8944-4940Marwah T. Salman1https://orcid.org/0009-0009-9909-8055Maryam K. Ghalati2Andrea Lecchini-Visintini3https://orcid.org/0000-0002-1654-8877David R. Siddle4https://orcid.org/0000-0002-1125-5610Hongbiao Dong5https://orcid.org/0000-0003-1244-0364School of Engineering, University of Leicester, Leicester, U.K.School of Engineering, University of Leicester, Leicester, U.K.School of Engineering, University of Leicester, Leicester, U.K.School of Electronics and Computer Science, University of Southampton, Southampton, U.K.School of Engineering, University of Leicester, Leicester, U.K.School of Engineering, University of Leicester, Leicester, U.K.Class imbalance is a pervasive challenge in real-world machine learning (ML) applications, where the minority class, often the class of interest, is significantly underrepresented. This imbalance can degrade model performance, result in misleading evaluation metrics, and complicate validation processes. Two prominent data-augmentation techniques to address class imbalance are the Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GAN). However, both techniques have inherent limitations, motivating the emergence of novel variants designed to overcome these challenges. While previous reviews have typically focused on specific domains, conventional methodologies, or broad strategy overviews, this review presents a unified taxonomy that outlines the causes, types, and implications of class imbalance across diverse ML tasks. It further examines emerging trends in the application of SMOTE and GAN techniques, their limitations, and hybrid adaptations. By categorising imbalance types and analysing models, metrics, datasets, and comparative approaches, this review provides actionable insights and identifies future research directions for practitioners and researchers working to address class imbalance in real-world ML tasks.https://ieeexplore.ieee.org/document/11062634/Class imbalancedata-augmentationgenerative adversarial networksmachine learningSMOTE |
| spellingShingle | Amadi G. Udu Marwah T. Salman Maryam K. Ghalati Andrea Lecchini-Visintini David R. Siddle Hongbiao Dong Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review IEEE Access Class imbalance data-augmentation generative adversarial networks machine learning SMOTE |
| title | Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review |
| title_full | Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review |
| title_fullStr | Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review |
| title_full_unstemmed | Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review |
| title_short | Emerging SMOTE and GAN Variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review |
| title_sort | emerging smote and gan variants for data augmentation in imbalance machine learning tasks a review |
| topic | Class imbalance data-augmentation generative adversarial networks machine learning SMOTE |
| url | https://ieeexplore.ieee.org/document/11062634/ |
| work_keys_str_mv | AT amadigudu emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview AT marwahtsalman emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview AT maryamkghalati emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview AT andrealecchinivisintini emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview AT davidrsiddle emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview AT hongbiaodong emergingsmoteandganvariantsfordataaugmentationinimbalancemachinelearningtasksareview |