Imbalanced Data Problem in Machine Learning: A Review

One of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to...

Full description

Saved in:
Bibliographic Details
Main Authors: Manahel Altalhan, Abdulmohsen Algarni, Monia Turki-Hadj Alouane
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10845793/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586872794120192
author Manahel Altalhan
Abdulmohsen Algarni
Monia Turki-Hadj Alouane
author_facet Manahel Altalhan
Abdulmohsen Algarni
Monia Turki-Hadj Alouane
author_sort Manahel Altalhan
collection DOAJ
description One of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to address the difficulties posed by imbalanced data. It discusses data-level methods such as oversampling and undersampling, algorithm-level solutions including ensemble learning and specific algorithm adjustments, cost-sensitive algorithms, and hybrid strategies that combine multiple approaches. Moreover, this paper emphasizes the crucial role of evaluation methods like Precision, F1 Score, Recall, G-mean, and AUC in measuring the effectiveness of these strategies under imbalanced conditions. A detailed review of recent research articles helps pinpoint persistent gaps in generalizability, scalability, and robustness across these methods, underscoring the necessity for ongoing improvements. The survey seeks to offer an extensive overview of current approaches that improve the efficiency and effectiveness of machine learning models dealing with imbalanced datasets, thus equipping researchers with the insights needed to develop robust and effective models ready for real-world application.
format Article
id doaj-art-890cddfffa2746d08153030d5794a951
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-890cddfffa2746d08153030d5794a9512025-01-25T00:01:10ZengIEEEIEEE Access2169-35362025-01-0113136861369910.1109/ACCESS.2025.353166210845793Imbalanced Data Problem in Machine Learning: A ReviewManahel Altalhan0https://orcid.org/0009-0002-7807-2162Abdulmohsen Algarni1https://orcid.org/0000-0002-7556-958XMonia Turki-Hadj Alouane2https://orcid.org/0000-0002-6375-0824Department of Computer Science, King Khalid University, Abha, Saudi ArabiaDepartment of Computer Science, King Khalid University, Abha, Saudi ArabiaDepartment of Computer Science, King Khalid University, Abha, Saudi ArabiaOne of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to address the difficulties posed by imbalanced data. It discusses data-level methods such as oversampling and undersampling, algorithm-level solutions including ensemble learning and specific algorithm adjustments, cost-sensitive algorithms, and hybrid strategies that combine multiple approaches. Moreover, this paper emphasizes the crucial role of evaluation methods like Precision, F1 Score, Recall, G-mean, and AUC in measuring the effectiveness of these strategies under imbalanced conditions. A detailed review of recent research articles helps pinpoint persistent gaps in generalizability, scalability, and robustness across these methods, underscoring the necessity for ongoing improvements. The survey seeks to offer an extensive overview of current approaches that improve the efficiency and effectiveness of machine learning models dealing with imbalanced datasets, thus equipping researchers with the insights needed to develop robust and effective models ready for real-world application.https://ieeexplore.ieee.org/document/10845793/Imbalanced datamachine learningbalance techniquesevaluation methods
spellingShingle Manahel Altalhan
Abdulmohsen Algarni
Monia Turki-Hadj Alouane
Imbalanced Data Problem in Machine Learning: A Review
IEEE Access
Imbalanced data
machine learning
balance techniques
evaluation methods
title Imbalanced Data Problem in Machine Learning: A Review
title_full Imbalanced Data Problem in Machine Learning: A Review
title_fullStr Imbalanced Data Problem in Machine Learning: A Review
title_full_unstemmed Imbalanced Data Problem in Machine Learning: A Review
title_short Imbalanced Data Problem in Machine Learning: A Review
title_sort imbalanced data problem in machine learning a review
topic Imbalanced data
machine learning
balance techniques
evaluation methods
url https://ieeexplore.ieee.org/document/10845793/
work_keys_str_mv AT manahelaltalhan imbalanceddataprobleminmachinelearningareview
AT abdulmohsenalgarni imbalanceddataprobleminmachinelearningareview
AT moniaturkihadjalouane imbalanceddataprobleminmachinelearningareview