An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.

A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data st...

Full description

Saved in:
Bibliographic Details
Main Authors: Yansong Liu, Shuang Wang, He Sui, Li Zhu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0292140&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850206460223422464
author Yansong Liu
Shuang Wang
He Sui
Li Zhu
author_facet Yansong Liu
Shuang Wang
He Sui
Li Zhu
author_sort Yansong Liu
collection DOAJ
description A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.
format Article
id doaj-art-9061ed9e486140a4acd25577ae28beae
institution OA Journals
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-9061ed9e486140a4acd25577ae28beae2025-08-20T02:10:50ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01191e029214010.1371/journal.pone.0292140An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.Yansong LiuShuang WangHe SuiLi ZhuA challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0292140&type=printable
spellingShingle Yansong Liu
Shuang Wang
He Sui
Li Zhu
An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.
PLoS ONE
title An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.
title_full An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.
title_fullStr An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.
title_full_unstemmed An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.
title_short An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift.
title_sort ensemble learning method with gan based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0292140&type=printable
work_keys_str_mv AT yansongliu anensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT shuangwang anensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT hesui anensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT lizhu anensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT yansongliu ensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT shuangwang ensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT hesui ensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift
AT lizhu ensemblelearningmethodwithganbasedsamplingandconsistencycheckforanomalydetectionofimbalanceddatastreamswithconceptdrift