GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting

Imbalanced data significantly affects the performance of standard classification models. Data-level approaches primarily use oversampling methods, such as the synthetic minority oversampling technique (SMOTE), to address this problem. However, because methods such as SMOTE generate instances via lin...

Full description

Saved in:
Bibliographic Details
Main Authors: Seung Jee Yang, Kyungjoon Cha
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10804168/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850254388988215296
author Seung Jee Yang
Kyungjoon Cha
author_facet Seung Jee Yang
Kyungjoon Cha
author_sort Seung Jee Yang
collection DOAJ
description Imbalanced data significantly affects the performance of standard classification models. Data-level approaches primarily use oversampling methods, such as the synthetic minority oversampling technique (SMOTE), to address this problem. However, because methods such as SMOTE generate instances via linear interpolation, the synthetic data space may appear similar to a star or tree. Thus, some methods apply Gaussian weights to linear interpolation to address this issue. In this study, we propose a Gaussian-based minority oversampling with adaptive outlier filtering and class overlap weighting (GMO-AC) for imbalanced datasets. Unlike existing oversampling techniques, our method employs a Gaussian mixture model (GMM) to approximate the distribution of the minority class and generate new instances that follow this distribution. As outliers can affect the distribution approximation, GMO-AC identifies outliers by calculating the Mahalanobis distance for each instance and the covariance determinant. This process uses segmented linear regression to assess whether an instance falls outside the expected distribution. In addition, we defined the degree of class overlap to generate additional instances in the overlapping areas to improve the classification of the minority class in those areas. Experiments were conducted on synthetic and benchmark datasets, comparing the performance of GMO-AC with that of other methods, such as SMOTE. Experimental results show that GMO-AC yielded better AUROC and G-mean.
format Article
id doaj-art-c344967202df4c0e9be6373b6acb7c4d
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c344967202df4c0e9be6373b6acb7c4d2025-08-20T01:57:08ZengIEEEIEEE Access2169-35362024-01-011219249419250910.1109/ACCESS.2024.351857310804168GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap WeightingSeung Jee Yang0https://orcid.org/0000-0002-2250-5195Kyungjoon Cha1https://orcid.org/0000-0003-2261-0785Department of Applied Statistics, Hanyang University, Seongdong-gu, Seoul, South KoreaInstitute for Convergence of Basic Science, Hanyang University, Seongdong-gu, Seoul, South KoreaImbalanced data significantly affects the performance of standard classification models. Data-level approaches primarily use oversampling methods, such as the synthetic minority oversampling technique (SMOTE), to address this problem. However, because methods such as SMOTE generate instances via linear interpolation, the synthetic data space may appear similar to a star or tree. Thus, some methods apply Gaussian weights to linear interpolation to address this issue. In this study, we propose a Gaussian-based minority oversampling with adaptive outlier filtering and class overlap weighting (GMO-AC) for imbalanced datasets. Unlike existing oversampling techniques, our method employs a Gaussian mixture model (GMM) to approximate the distribution of the minority class and generate new instances that follow this distribution. As outliers can affect the distribution approximation, GMO-AC identifies outliers by calculating the Mahalanobis distance for each instance and the covariance determinant. This process uses segmented linear regression to assess whether an instance falls outside the expected distribution. In addition, we defined the degree of class overlap to generate additional instances in the overlapping areas to improve the classification of the minority class in those areas. Experiments were conducted on synthetic and benchmark datasets, comparing the performance of GMO-AC with that of other methods, such as SMOTE. Experimental results show that GMO-AC yielded better AUROC and G-mean.https://ieeexplore.ieee.org/document/10804168/GMMimbalanced classificationoversampling
spellingShingle Seung Jee Yang
Kyungjoon Cha
GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
IEEE Access
GMM
imbalanced classification
oversampling
title GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
title_full GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
title_fullStr GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
title_full_unstemmed GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
title_short GMO-AC: Gaussian-Based Minority Oversampling With Adaptive Outlier Filtering and Class Overlap Weighting
title_sort gmo ac gaussian based minority oversampling with adaptive outlier filtering and class overlap weighting
topic GMM
imbalanced classification
oversampling
url https://ieeexplore.ieee.org/document/10804168/
work_keys_str_mv AT seungjeeyang gmoacgaussianbasedminorityoversamplingwithadaptiveoutlierfilteringandclassoverlapweighting
AT kyungjooncha gmoacgaussianbasedminorityoversamplingwithadaptiveoutlierfilteringandclassoverlapweighting