GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
Imbalanced data significantly affects the performance of standard classification models. Data-level approaches primarily use oversampling methods, such as the synthetic minority oversampling technique (SMOTE), to address this problem. However, because methods such as SMOTE generate instances via lin...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10804168/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850254388988215296 |
|---|---|
| author | Seung Jee Yang Kyungjoon Cha |
| author_facet | Seung Jee Yang Kyungjoon Cha |
| author_sort | Seung Jee Yang |
| collection | DOAJ |
| description | Imbalanced data significantly affects the performance of standard classification models. Data-level approaches primarily use oversampling methods, such as the synthetic minority oversampling technique (SMOTE), to address this problem. However, because methods such as SMOTE generate instances via linear interpolation, the synthetic data space may appear similar to a star or tree. Thus, some methods apply Gaussian weights to linear interpolation to address this issue. In this study, we propose a Gaussian-based minority oversampling with adaptive outlier filtering and class overlap weighting (GMO-AC) for imbalanced datasets. Unlike existing oversampling techniques, our method employs a Gaussian mixture model (GMM) to approximate the distribution of the minority class and generate new instances that follow this distribution. As outliers can affect the distribution approximation, GMO-AC identifies outliers by calculating the Mahalanobis distance for each instance and the covariance determinant. This process uses segmented linear regression to assess whether an instance falls outside the expected distribution. In addition, we defined the degree of class overlap to generate additional instances in the overlapping areas to improve the classification of the minority class in those areas. Experiments were conducted on synthetic and benchmark datasets, comparing the performance of GMO-AC with that of other methods, such as SMOTE. Experimental results show that GMO-AC yielded better AUROC and G-mean. |
| format | Article |
| id | doaj-art-c344967202df4c0e9be6373b6acb7c4d |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-c344967202df4c0e9be6373b6acb7c4d2025-08-20T01:57:08ZengIEEEIEEE Access2169-35362024-01-011219249419250910.1109/ACCESS.2024.351857310804168GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap WeightingSeung Jee Yang0https://orcid.org/0000-0002-2250-5195Kyungjoon Cha1https://orcid.org/0000-0003-2261-0785Department of Applied Statistics, Hanyang University, Seongdong-gu, Seoul, South KoreaInstitute for Convergence of Basic Science, Hanyang University, Seongdong-gu, Seoul, South KoreaImbalanced data significantly affects the performance of standard classification models. Data-level approaches primarily use oversampling methods, such as the synthetic minority oversampling technique (SMOTE), to address this problem. However, because methods such as SMOTE generate instances via linear interpolation, the synthetic data space may appear similar to a star or tree. Thus, some methods apply Gaussian weights to linear interpolation to address this issue. In this study, we propose a Gaussian-based minority oversampling with adaptive outlier filtering and class overlap weighting (GMO-AC) for imbalanced datasets. Unlike existing oversampling techniques, our method employs a Gaussian mixture model (GMM) to approximate the distribution of the minority class and generate new instances that follow this distribution. As outliers can affect the distribution approximation, GMO-AC identifies outliers by calculating the Mahalanobis distance for each instance and the covariance determinant. This process uses segmented linear regression to assess whether an instance falls outside the expected distribution. In addition, we defined the degree of class overlap to generate additional instances in the overlapping areas to improve the classification of the minority class in those areas. Experiments were conducted on synthetic and benchmark datasets, comparing the performance of GMO-AC with that of other methods, such as SMOTE. Experimental results show that GMO-AC yielded better AUROC and G-mean.https://ieeexplore.ieee.org/document/10804168/GMMimbalanced classificationoversampling |
| spellingShingle | Seung Jee Yang Kyungjoon Cha GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting IEEE Access GMM imbalanced classification oversampling |
| title | GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting |
| title_full | GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting |
| title_fullStr | GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting |
| title_full_unstemmed | GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting |
| title_short | GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting |
| title_sort | gmo ac gaussian based minority oversampling with adaptive outlier filtering and class overlap weighting |
| topic | GMM imbalanced classification oversampling |
| url | https://ieeexplore.ieee.org/document/10804168/ |
| work_keys_str_mv | AT seungjeeyang gmoacgaussianbasedminorityoversamplingwithadaptiveoutlierfilteringandclassoverlapweighting AT kyungjooncha gmoacgaussianbasedminorityoversamplingwithadaptiveoutlierfilteringandclassoverlapweighting |