High Density Subspace Clustering Algorithm for High Dimensional Data
Highdimensional data have the characteristics of sparsity and vulnerability to dimension disaster, which makes it is difficult to ensure the precision and efficiency of high dimensional data clustering Therefore the method of subspace clustering is adopted to reduce the impact of sparsity and dimens...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
Harbin University of Science and Technology Publications
2020-08-01
|
| Series: | Journal of Harbin University of Science and Technology |
| Subjects: | |
| Online Access: | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=1909 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Highdimensional data have the characteristics of sparsity and vulnerability to dimension disaster, which makes it is difficult to ensure the precision and efficiency of high dimensional data clustering Therefore the method of subspace clustering is adopted to reduce the impact of sparsity and dimension disaster on the clustering results Firstly, random sampling is adopted to select the dimension which is suitable for clustering from highdimensional data to generate subspace, and the hoeffding bound is combined to ensure the validity of sampling results Secondly, by using the adjacency of the grid, mixed grids are generated in the subspace, which can guarantee the integrity of data and improve the density of the subspace Finally, according to the similarity and dissimilarity of subspace, the dimension pruning is carried out to improve the subspace density again The algorithm can achieve better results on UCI data set, and it has better performance in scalability and antinoise ability |
|---|---|
| ISSN: | 1007-2683 |