High Density Subspace Clustering Algorithm for High Dimensional Data

Highdimensional data have the characteristics of sparsity and vulnerability to dimension disaster, which makes it is difficult to ensure the precision and efficiency of high dimensional data clustering Therefore the method of subspace clustering is adopted to reduce the impact of sparsity and dimens...

Full description

Saved in:
Bibliographic Details
Main Authors: WAN Jing, ZHENG Longjun, HE Yunbin, LI Song
Format: Article
Language:zho
Published: Harbin University of Science and Technology Publications 2020-08-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=1909
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Highdimensional data have the characteristics of sparsity and vulnerability to dimension disaster, which makes it is difficult to ensure the precision and efficiency of high dimensional data clustering Therefore the method of subspace clustering is adopted to reduce the impact of sparsity and dimension disaster on the clustering results Firstly, random sampling is adopted to select the dimension which is suitable for clustering from highdimensional data to generate subspace, and the hoeffding bound is combined to ensure the validity of sampling results Secondly, by using the adjacency of the grid, mixed grids are generated in the subspace, which can guarantee the integrity of data and improve the density of the subspace Finally, according to the similarity and dissimilarity of subspace, the dimension pruning is carried out to improve the subspace density again The algorithm can achieve better results on UCI data set, and it has better performance in scalability and antinoise ability
ISSN:1007-2683