EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
Detecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods hav...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2021/9178461 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832549653005991936 |
---|---|
author | Mohamed Jaward Bah Hongzhi Wang Li-Hui Zhao Ji Zhang Jie Xiao |
author_facet | Mohamed Jaward Bah Hongzhi Wang Li-Hui Zhao Ji Zhang Jie Xiao |
author_sort | Mohamed Jaward Bah |
collection | DOAJ |
description | Detecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods have inherent challenges and drawbacks. These include to effectively cluster sparse data points which has to do with the quality of clustering methods, dealing with continuous fast-incoming data streams, high memory and time consumption, and lack of high outlier detection accuracy. This paper aims at proposing an effective clustering-based approach to detect outliers in evolving data streams. We propose a new method called Effective Microcluster and Minimal pruning CLustering-based method for Outlier detection in Data Streams (EMM-CLODS). It is a clustering-based outlier detection approach that detects outliers in evolving data streams by first applying microclustering technique to cluster dense data points and effectively handle objects within a sliding window according to the relevance of their status to their respective neighbors or position. The analysis from our experimental studies on both synthetic and real-world datasets shows that the technique performs well with minimal memory and time consumption when compared to the other baseline algorithms, making it a very promising technique in dealing with outlier detection problems in data streams. |
format | Article |
id | doaj-art-5c2e550bbb8f49369626d71bfcd500af |
institution | Kabale University |
issn | 1076-2787 1099-0526 |
language | English |
publishDate | 2021-01-01 |
publisher | Wiley |
record_format | Article |
series | Complexity |
spelling | doaj-art-5c2e550bbb8f49369626d71bfcd500af2025-02-03T06:10:45ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/91784619178461EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data StreamsMohamed Jaward Bah0Hongzhi Wang1Li-Hui Zhao2Ji Zhang3Jie Xiao4Zhejiang Lab, Hangzhou, ChinaHarbin Institute of Technology, Harbin, ChinaNorth University of China, Taiyuan, ChinaUniversity of Southern Queensland, Toowoomba, AustraliaHangzhou Yugu Technology Co., Ltd., Hangzhou, ChinaDetecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods have inherent challenges and drawbacks. These include to effectively cluster sparse data points which has to do with the quality of clustering methods, dealing with continuous fast-incoming data streams, high memory and time consumption, and lack of high outlier detection accuracy. This paper aims at proposing an effective clustering-based approach to detect outliers in evolving data streams. We propose a new method called Effective Microcluster and Minimal pruning CLustering-based method for Outlier detection in Data Streams (EMM-CLODS). It is a clustering-based outlier detection approach that detects outliers in evolving data streams by first applying microclustering technique to cluster dense data points and effectively handle objects within a sliding window according to the relevance of their status to their respective neighbors or position. The analysis from our experimental studies on both synthetic and real-world datasets shows that the technique performs well with minimal memory and time consumption when compared to the other baseline algorithms, making it a very promising technique in dealing with outlier detection problems in data streams.http://dx.doi.org/10.1155/2021/9178461 |
spellingShingle | Mohamed Jaward Bah Hongzhi Wang Li-Hui Zhao Ji Zhang Jie Xiao EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams Complexity |
title | EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams |
title_full | EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams |
title_fullStr | EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams |
title_full_unstemmed | EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams |
title_short | EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams |
title_sort | emm clods an effective microcluster and minimal pruning clustering based technique for detecting outliers in data streams |
url | http://dx.doi.org/10.1155/2021/9178461 |
work_keys_str_mv | AT mohamedjawardbah emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams AT hongzhiwang emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams AT lihuizhao emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams AT jizhang emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams AT jiexiao emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams |