EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams

Detecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods hav...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohamed Jaward Bah, Hongzhi Wang, Li-Hui Zhao, Ji Zhang, Jie Xiao
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2021/9178461
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832549653005991936
author Mohamed Jaward Bah
Hongzhi Wang
Li-Hui Zhao
Ji Zhang
Jie Xiao
author_facet Mohamed Jaward Bah
Hongzhi Wang
Li-Hui Zhao
Ji Zhang
Jie Xiao
author_sort Mohamed Jaward Bah
collection DOAJ
description Detecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods have inherent challenges and drawbacks. These include to effectively cluster sparse data points which has to do with the quality of clustering methods, dealing with continuous fast-incoming data streams, high memory and time consumption, and lack of high outlier detection accuracy. This paper aims at proposing an effective clustering-based approach to detect outliers in evolving data streams. We propose a new method called Effective Microcluster and Minimal pruning CLustering-based method for Outlier detection in Data Streams (EMM-CLODS). It is a clustering-based outlier detection approach that detects outliers in evolving data streams by first applying microclustering technique to cluster dense data points and effectively handle objects within a sliding window according to the relevance of their status to their respective neighbors or position. The analysis from our experimental studies on both synthetic and real-world datasets shows that the technique performs well with minimal memory and time consumption when compared to the other baseline algorithms, making it a very promising technique in dealing with outlier detection problems in data streams.
format Article
id doaj-art-5c2e550bbb8f49369626d71bfcd500af
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-5c2e550bbb8f49369626d71bfcd500af2025-02-03T06:10:45ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/91784619178461EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data StreamsMohamed Jaward Bah0Hongzhi Wang1Li-Hui Zhao2Ji Zhang3Jie Xiao4Zhejiang Lab, Hangzhou, ChinaHarbin Institute of Technology, Harbin, ChinaNorth University of China, Taiyuan, ChinaUniversity of Southern Queensland, Toowoomba, AustraliaHangzhou Yugu Technology Co., Ltd., Hangzhou, ChinaDetecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods have inherent challenges and drawbacks. These include to effectively cluster sparse data points which has to do with the quality of clustering methods, dealing with continuous fast-incoming data streams, high memory and time consumption, and lack of high outlier detection accuracy. This paper aims at proposing an effective clustering-based approach to detect outliers in evolving data streams. We propose a new method called Effective Microcluster and Minimal pruning CLustering-based method for Outlier detection in Data Streams (EMM-CLODS). It is a clustering-based outlier detection approach that detects outliers in evolving data streams by first applying microclustering technique to cluster dense data points and effectively handle objects within a sliding window according to the relevance of their status to their respective neighbors or position. The analysis from our experimental studies on both synthetic and real-world datasets shows that the technique performs well with minimal memory and time consumption when compared to the other baseline algorithms, making it a very promising technique in dealing with outlier detection problems in data streams.http://dx.doi.org/10.1155/2021/9178461
spellingShingle Mohamed Jaward Bah
Hongzhi Wang
Li-Hui Zhao
Ji Zhang
Jie Xiao
EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
Complexity
title EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
title_full EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
title_fullStr EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
title_full_unstemmed EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
title_short EMM-CLODS: An Effective Microcluster and Minimal Pruning CLustering-Based Technique for Detecting Outliers in Data Streams
title_sort emm clods an effective microcluster and minimal pruning clustering based technique for detecting outliers in data streams
url http://dx.doi.org/10.1155/2021/9178461
work_keys_str_mv AT mohamedjawardbah emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams
AT hongzhiwang emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams
AT lihuizhao emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams
AT jizhang emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams
AT jiexiao emmclodsaneffectivemicroclusterandminimalpruningclusteringbasedtechniquefordetectingoutliersindatastreams