An Efficient Density-Based Local Outlier Detection Approach for Scattered Data

After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Shubin Su, Limin Xiao, Li Ruan, Fei Gu, Shupan Li, Zhaokai Wang, Rongbin Xu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8572736/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850073495122214912
author Shubin Su
Limin Xiao
Li Ruan
Fei Gu
Shupan Li
Zhaokai Wang
Rongbin Xu
author_facet Shubin Su
Limin Xiao
Li Ruan
Fei Gu
Shupan Li
Zhaokai Wang
Rongbin Xu
author_sort Shubin Su
collection DOAJ
description After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.
format Article
id doaj-art-643471a8f14344faba3de8f147b3576c
institution DOAJ
issn 2169-3536
language English
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-643471a8f14344faba3de8f147b3576c2025-08-20T02:46:49ZengIEEEIEEE Access2169-35362019-01-0171006102010.1109/ACCESS.2018.28861978572736An Efficient Density-Based Local Outlier Detection Approach for Scattered DataShubin Su0https://orcid.org/0000-0003-1395-4578Limin Xiao1Li Ruan2Fei Gu3Shupan Li4https://orcid.org/0000-0002-5823-2037Zhaokai Wang5Rongbin Xu6State Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaState Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaState Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaState Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaState Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaState Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaState Key Laboratory of Software Development Environment, Beihang University, Beijing, ChinaAfter the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.https://ieeexplore.ieee.org/document/8572736/Outlier detectionlocal outlier factorneighborhood variancerough clusteringscattered dataset
spellingShingle Shubin Su
Limin Xiao
Li Ruan
Fei Gu
Shupan Li
Zhaokai Wang
Rongbin Xu
An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
IEEE Access
Outlier detection
local outlier factor
neighborhood variance
rough clustering
scattered dataset
title An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
title_full An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
title_fullStr An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
title_full_unstemmed An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
title_short An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
title_sort efficient density based local outlier detection approach for scattered data
topic Outlier detection
local outlier factor
neighborhood variance
rough clustering
scattered dataset
url https://ieeexplore.ieee.org/document/8572736/
work_keys_str_mv AT shubinsu anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT liminxiao anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT liruan anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT feigu anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT shupanli anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT zhaokaiwang anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT rongbinxu anefficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT shubinsu efficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT liminxiao efficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT liruan efficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT feigu efficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT shupanli efficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT zhaokaiwang efficientdensitybasedlocaloutlierdetectionapproachforscattereddata
AT rongbinxu efficientdensitybasedlocaloutlierdetectionapproachforscattereddata