Imbalanced Data Problem in Machine Learning: A Review
One of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10845793/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586872794120192 |
---|---|
author | Manahel Altalhan Abdulmohsen Algarni Monia Turki-Hadj Alouane |
author_facet | Manahel Altalhan Abdulmohsen Algarni Monia Turki-Hadj Alouane |
author_sort | Manahel Altalhan |
collection | DOAJ |
description | One of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to address the difficulties posed by imbalanced data. It discusses data-level methods such as oversampling and undersampling, algorithm-level solutions including ensemble learning and specific algorithm adjustments, cost-sensitive algorithms, and hybrid strategies that combine multiple approaches. Moreover, this paper emphasizes the crucial role of evaluation methods like Precision, F1 Score, Recall, G-mean, and AUC in measuring the effectiveness of these strategies under imbalanced conditions. A detailed review of recent research articles helps pinpoint persistent gaps in generalizability, scalability, and robustness across these methods, underscoring the necessity for ongoing improvements. The survey seeks to offer an extensive overview of current approaches that improve the efficiency and effectiveness of machine learning models dealing with imbalanced datasets, thus equipping researchers with the insights needed to develop robust and effective models ready for real-world application. |
format | Article |
id | doaj-art-890cddfffa2746d08153030d5794a951 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-890cddfffa2746d08153030d5794a9512025-01-25T00:01:10ZengIEEEIEEE Access2169-35362025-01-0113136861369910.1109/ACCESS.2025.353166210845793Imbalanced Data Problem in Machine Learning: A ReviewManahel Altalhan0https://orcid.org/0009-0002-7807-2162Abdulmohsen Algarni1https://orcid.org/0000-0002-7556-958XMonia Turki-Hadj Alouane2https://orcid.org/0000-0002-6375-0824Department of Computer Science, King Khalid University, Abha, Saudi ArabiaDepartment of Computer Science, King Khalid University, Abha, Saudi ArabiaDepartment of Computer Science, King Khalid University, Abha, Saudi ArabiaOne of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to address the difficulties posed by imbalanced data. It discusses data-level methods such as oversampling and undersampling, algorithm-level solutions including ensemble learning and specific algorithm adjustments, cost-sensitive algorithms, and hybrid strategies that combine multiple approaches. Moreover, this paper emphasizes the crucial role of evaluation methods like Precision, F1 Score, Recall, G-mean, and AUC in measuring the effectiveness of these strategies under imbalanced conditions. A detailed review of recent research articles helps pinpoint persistent gaps in generalizability, scalability, and robustness across these methods, underscoring the necessity for ongoing improvements. The survey seeks to offer an extensive overview of current approaches that improve the efficiency and effectiveness of machine learning models dealing with imbalanced datasets, thus equipping researchers with the insights needed to develop robust and effective models ready for real-world application.https://ieeexplore.ieee.org/document/10845793/Imbalanced datamachine learningbalance techniquesevaluation methods |
spellingShingle | Manahel Altalhan Abdulmohsen Algarni Monia Turki-Hadj Alouane Imbalanced Data Problem in Machine Learning: A Review IEEE Access Imbalanced data machine learning balance techniques evaluation methods |
title | Imbalanced Data Problem in Machine Learning: A Review |
title_full | Imbalanced Data Problem in Machine Learning: A Review |
title_fullStr | Imbalanced Data Problem in Machine Learning: A Review |
title_full_unstemmed | Imbalanced Data Problem in Machine Learning: A Review |
title_short | Imbalanced Data Problem in Machine Learning: A Review |
title_sort | imbalanced data problem in machine learning a review |
topic | Imbalanced data machine learning balance techniques evaluation methods |
url | https://ieeexplore.ieee.org/document/10845793/ |
work_keys_str_mv | AT manahelaltalhan imbalanceddataprobleminmachinelearningareview AT abdulmohsenalgarni imbalanceddataprobleminmachinelearningareview AT moniaturkihadjalouane imbalanceddataprobleminmachinelearningareview |