FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods

The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> it may cause t...

Full description

Saved in:
Bibliographic Details
Main Authors: Roudani Mohammed, El Moutaouakil Karim
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10716646/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850150807501012992
author Roudani Mohammed
El Moutaouakil Karim
author_facet Roudani Mohammed
El Moutaouakil Karim
author_sort Roudani Mohammed
collection DOAJ
description The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> it may cause the over-generalization problem due to oversampling of noisy samples, <xref ref-type="disp-formula" rid="deqn2">(2)</xref> over-sampling of uninformative samples, and <xref ref-type="disp-formula" rid="deqn3">(3)</xref> increasing the overlaps between different classes around the class boundaries. Different approaches SMOTE based have been proposed to handle these problems, but most of them implement hyperparameters and tend to generate noise because the synthetic sample is generated, randomly, in the area delimited by current random minority data. In this research, an improved SMOTE-based method, namely Fuzzy-ADAptative-SMOTE-Based-Methods (FADA-SOMTE-Ms), which targets all three problems at the same time, is introduced. In this regard, the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE is chosen in such a way that the synthetic data is as far as possible from the two closest majority data. More precisely, this method processes into six steps: (a) clustering minority class into <italic>k</italic> groups (b) selecting a safe region (c) selecting random two minority data, (d) finding the M closest majority data to these minority data using original membership functions based on Fuzzy mean and flirting results, (e) finding the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE producing a synthetic data as close as possible to the minority class and as far as possible from the M majority data by solving a very simple multi-objective mathematical optimization model, and (f) using SMOTE to generate synthetic samples using optimal <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE. FADA-SOMTE-Ms is evaluated using 5 classifiers, 21 unbalanced datasets, and it&#x2019;s compared to 8 oversampling methods using three performance measures. FADA-SOMTE-Ms consistently outperforms other popular oversampling methods.
format Article
id doaj-art-1cb752e4f78344c29f63d059e9e26de1
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-1cb752e4f78344c29f63d059e9e26de12025-08-20T02:26:27ZengIEEEIEEE Access2169-35362024-01-011215874215876510.1109/ACCESS.2024.348084810716646FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based MethodsRoudani Mohammed0https://orcid.org/0009-0002-9321-830XEl Moutaouakil Karim1https://orcid.org/0000-0003-3922-5592Mathematics and Data Science Laboratory (MDSL), Multidisciplinary Faculty of Taza, University Sidi Mohamed Ben Abdellah, Fez, MoroccoMathematics and Data Science Laboratory (MDSL), Multidisciplinary Faculty of Taza, University Sidi Mohamed Ben Abdellah, Fez, MoroccoThe Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> it may cause the over-generalization problem due to oversampling of noisy samples, <xref ref-type="disp-formula" rid="deqn2">(2)</xref> over-sampling of uninformative samples, and <xref ref-type="disp-formula" rid="deqn3">(3)</xref> increasing the overlaps between different classes around the class boundaries. Different approaches SMOTE based have been proposed to handle these problems, but most of them implement hyperparameters and tend to generate noise because the synthetic sample is generated, randomly, in the area delimited by current random minority data. In this research, an improved SMOTE-based method, namely Fuzzy-ADAptative-SMOTE-Based-Methods (FADA-SOMTE-Ms), which targets all three problems at the same time, is introduced. In this regard, the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE is chosen in such a way that the synthetic data is as far as possible from the two closest majority data. More precisely, this method processes into six steps: (a) clustering minority class into <italic>k</italic> groups (b) selecting a safe region (c) selecting random two minority data, (d) finding the M closest majority data to these minority data using original membership functions based on Fuzzy mean and flirting results, (e) finding the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE producing a synthetic data as close as possible to the minority class and as far as possible from the M majority data by solving a very simple multi-objective mathematical optimization model, and (f) using SMOTE to generate synthetic samples using optimal <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE. FADA-SOMTE-Ms is evaluated using 5 classifiers, 21 unbalanced datasets, and it&#x2019;s compared to 8 oversampling methods using three performance measures. FADA-SOMTE-Ms consistently outperforms other popular oversampling methods.https://ieeexplore.ieee.org/document/10716646/ClassificationoversamplingSMOTEunbalanced databig data
spellingShingle Roudani Mohammed
El Moutaouakil Karim
FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
IEEE Access
Classification
oversampling
SMOTE
unbalanced data
big data
title FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
title_full FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
title_fullStr FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
title_full_unstemmed FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
title_short FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
title_sort fada smote ms fuzzy adaptative smote based methods
topic Classification
oversampling
SMOTE
unbalanced data
big data
url https://ieeexplore.ieee.org/document/10716646/
work_keys_str_mv AT roudanimohammed fadasmotemsfuzzyadaptativesmotebasedmethods
AT elmoutaouakilkarim fadasmotemsfuzzyadaptativesmotebasedmethods