A novel oversampling method based on Wasserstein CGAN for imbalanced classification

Abstract Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on GANs has proliferated. These techniques have proven to be excellent in addressing the class imbalance issue by capturing...

Full description

Saved in:
Bibliographic Details
Main Authors: Hongfang Zhou, Heng Pan, Kangyun Zheng, Zongling Wu, Qingyu Xiang
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-024-00290-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571593145974784
author Hongfang Zhou
Heng Pan
Kangyun Zheng
Zongling Wu
Qingyu Xiang
author_facet Hongfang Zhou
Heng Pan
Kangyun Zheng
Zongling Wu
Qingyu Xiang
author_sort Hongfang Zhou
collection DOAJ
description Abstract Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on GANs has proliferated. These techniques have proven to be excellent in addressing the class imbalance issue by capturing the distributional features of minority samples during training and generating high-quality new samples. However, oversampling methods based on GANs may suffer from gradient vanishing, resulting in mode collapse, and produce noise and boundary-blurring issues when generating new samples. This paper proposes a novel oversampling method based on a conditional GAN (CGAN) incorporating Wasserstein distance. It generates an initial balanced dataset from minority class samples using the CGAN oversampling approach and then uses a noise and boundary recognition method based on K-means and $$k$$ k nearest neighbors algorithm to address the noise and boundary-blurring issues. The proposed method generates new samples that are highly consistent with the original sample distribution and effectively solves the problems of noise data and class boundary blurring. Experimental results on multiple public datasets show that the proposed method achieves significant improvements in evaluation metrics such as Recall, F1_score, G-mean, and AUC.
format Article
id doaj-art-7b7a7c02936b4994b4cd381fbd42f54d
institution Kabale University
issn 2523-3246
language English
publishDate 2025-02-01
publisher SpringerOpen
record_format Article
series Cybersecurity
spelling doaj-art-7b7a7c02936b4994b4cd381fbd42f54d2025-02-02T12:30:05ZengSpringerOpenCybersecurity2523-32462025-02-018112010.1186/s42400-024-00290-0A novel oversampling method based on Wasserstein CGAN for imbalanced classificationHongfang Zhou0Heng Pan1Kangyun Zheng2Zongling Wu3Qingyu Xiang4School of Computer Science and Engineering, Xi’an University of TechnologySchool of Computer Science and Engineering, Xi’an University of TechnologySchool of Computer Science and Engineering, Xi’an University of TechnologySchool of Computer Science and Engineering, Xi’an University of TechnologySchool of Chemical Engineering, Northwest UniversityAbstract Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on GANs has proliferated. These techniques have proven to be excellent in addressing the class imbalance issue by capturing the distributional features of minority samples during training and generating high-quality new samples. However, oversampling methods based on GANs may suffer from gradient vanishing, resulting in mode collapse, and produce noise and boundary-blurring issues when generating new samples. This paper proposes a novel oversampling method based on a conditional GAN (CGAN) incorporating Wasserstein distance. It generates an initial balanced dataset from minority class samples using the CGAN oversampling approach and then uses a noise and boundary recognition method based on K-means and $$k$$ k nearest neighbors algorithm to address the noise and boundary-blurring issues. The proposed method generates new samples that are highly consistent with the original sample distribution and effectively solves the problems of noise data and class boundary blurring. Experimental results on multiple public datasets show that the proposed method achieves significant improvements in evaluation metrics such as Recall, F1_score, G-mean, and AUC.https://doi.org/10.1186/s42400-024-00290-0CGANImbalanced dataClassificationOversamplingK-means$$k$$ k nearest neighbors
spellingShingle Hongfang Zhou
Heng Pan
Kangyun Zheng
Zongling Wu
Qingyu Xiang
A novel oversampling method based on Wasserstein CGAN for imbalanced classification
Cybersecurity
CGAN
Imbalanced data
Classification
Oversampling
K-means
$$k$$ k nearest neighbors
title A novel oversampling method based on Wasserstein CGAN for imbalanced classification
title_full A novel oversampling method based on Wasserstein CGAN for imbalanced classification
title_fullStr A novel oversampling method based on Wasserstein CGAN for imbalanced classification
title_full_unstemmed A novel oversampling method based on Wasserstein CGAN for imbalanced classification
title_short A novel oversampling method based on Wasserstein CGAN for imbalanced classification
title_sort novel oversampling method based on wasserstein cgan for imbalanced classification
topic CGAN
Imbalanced data
Classification
Oversampling
K-means
$$k$$ k nearest neighbors
url https://doi.org/10.1186/s42400-024-00290-0
work_keys_str_mv AT hongfangzhou anoveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT hengpan anoveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT kangyunzheng anoveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT zonglingwu anoveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT qingyuxiang anoveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT hongfangzhou noveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT hengpan noveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT kangyunzheng noveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT zonglingwu noveloversamplingmethodbasedonwassersteincganforimbalancedclassification
AT qingyuxiang noveloversamplingmethodbasedonwassersteincganforimbalancedclassification