FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods
The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> it may cause t...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10716646/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850150807501012992 |
|---|---|
| author | Roudani Mohammed El Moutaouakil Karim |
| author_facet | Roudani Mohammed El Moutaouakil Karim |
| author_sort | Roudani Mohammed |
| collection | DOAJ |
| description | The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> it may cause the over-generalization problem due to oversampling of noisy samples, <xref ref-type="disp-formula" rid="deqn2">(2)</xref> over-sampling of uninformative samples, and <xref ref-type="disp-formula" rid="deqn3">(3)</xref> increasing the overlaps between different classes around the class boundaries. Different approaches SMOTE based have been proposed to handle these problems, but most of them implement hyperparameters and tend to generate noise because the synthetic sample is generated, randomly, in the area delimited by current random minority data. In this research, an improved SMOTE-based method, namely Fuzzy-ADAptative-SMOTE-Based-Methods (FADA-SOMTE-Ms), which targets all three problems at the same time, is introduced. In this regard, the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE is chosen in such a way that the synthetic data is as far as possible from the two closest majority data. More precisely, this method processes into six steps: (a) clustering minority class into <italic>k</italic> groups (b) selecting a safe region (c) selecting random two minority data, (d) finding the M closest majority data to these minority data using original membership functions based on Fuzzy mean and flirting results, (e) finding the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE producing a synthetic data as close as possible to the minority class and as far as possible from the M majority data by solving a very simple multi-objective mathematical optimization model, and (f) using SMOTE to generate synthetic samples using optimal <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE. FADA-SOMTE-Ms is evaluated using 5 classifiers, 21 unbalanced datasets, and it’s compared to 8 oversampling methods using three performance measures. FADA-SOMTE-Ms consistently outperforms other popular oversampling methods. |
| format | Article |
| id | doaj-art-1cb752e4f78344c29f63d059e9e26de1 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-1cb752e4f78344c29f63d059e9e26de12025-08-20T02:26:27ZengIEEEIEEE Access2169-35362024-01-011215874215876510.1109/ACCESS.2024.348084810716646FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based MethodsRoudani Mohammed0https://orcid.org/0009-0002-9321-830XEl Moutaouakil Karim1https://orcid.org/0000-0003-3922-5592Mathematics and Data Science Laboratory (MDSL), Multidisciplinary Faculty of Taza, University Sidi Mohamed Ben Abdellah, Fez, MoroccoMathematics and Data Science Laboratory (MDSL), Multidisciplinary Faculty of Taza, University Sidi Mohamed Ben Abdellah, Fez, MoroccoThe Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> it may cause the over-generalization problem due to oversampling of noisy samples, <xref ref-type="disp-formula" rid="deqn2">(2)</xref> over-sampling of uninformative samples, and <xref ref-type="disp-formula" rid="deqn3">(3)</xref> increasing the overlaps between different classes around the class boundaries. Different approaches SMOTE based have been proposed to handle these problems, but most of them implement hyperparameters and tend to generate noise because the synthetic sample is generated, randomly, in the area delimited by current random minority data. In this research, an improved SMOTE-based method, namely Fuzzy-ADAptative-SMOTE-Based-Methods (FADA-SOMTE-Ms), which targets all three problems at the same time, is introduced. In this regard, the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE is chosen in such a way that the synthetic data is as far as possible from the two closest majority data. More precisely, this method processes into six steps: (a) clustering minority class into <italic>k</italic> groups (b) selecting a safe region (c) selecting random two minority data, (d) finding the M closest majority data to these minority data using original membership functions based on Fuzzy mean and flirting results, (e) finding the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE producing a synthetic data as close as possible to the minority class and as far as possible from the M majority data by solving a very simple multi-objective mathematical optimization model, and (f) using SMOTE to generate synthetic samples using optimal <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-SMOTE. FADA-SOMTE-Ms is evaluated using 5 classifiers, 21 unbalanced datasets, and it’s compared to 8 oversampling methods using three performance measures. FADA-SOMTE-Ms consistently outperforms other popular oversampling methods.https://ieeexplore.ieee.org/document/10716646/ClassificationoversamplingSMOTEunbalanced databig data |
| spellingShingle | Roudani Mohammed El Moutaouakil Karim FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods IEEE Access Classification oversampling SMOTE unbalanced data big data |
| title | FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods |
| title_full | FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods |
| title_fullStr | FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods |
| title_full_unstemmed | FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods |
| title_short | FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods |
| title_sort | fada smote ms fuzzy adaptative smote based methods |
| topic | Classification oversampling SMOTE unbalanced data big data |
| url | https://ieeexplore.ieee.org/document/10716646/ |
| work_keys_str_mv | AT roudanimohammed fadasmotemsfuzzyadaptativesmotebasedmethods AT elmoutaouakilkarim fadasmotemsfuzzyadaptativesmotebasedmethods |