A Clustering Algorithm Based on the Detection of Density Peaks and the Interaction Degree Between Clusters

In order to cope with data with an irregular shape and uneven density, this paper proposes a two-phase clustering algorithm based on detecting the peaks of dimensional density and the degree of interaction between clusters (CPDD-ID). In the partitioning phase, the local densities of the data in all...

Full description

Saved in:
Bibliographic Details
Main Authors: Yangming Liu, Jiaman Ding, Hongbin Wang, Yi Du
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3612
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In order to cope with data with an irregular shape and uneven density, this paper proposes a two-phase clustering algorithm based on detecting the peaks of dimensional density and the degree of interaction between clusters (CPDD-ID). In the partitioning phase, the local densities of the data in all dimensions are calculated using kernel density estimation, the density curves are constructed based on the densities of all the data, and the peaks of the density curves are used as the benchmark to construct a Kd-Tree to search for the data points that are closest to each peak to partition the initial sub-clusters. Then, the intersection of the results of the initial sub-clusters obtained from all the dimensions is taken to obtain the final sub-clusters. The proposed partitioning strategy is able to accurately identify clusters with density differences and has significant effects in dealing with data with irregular shapes as well as uneven densities in this category. In addition, a new similarity measure based on the interaction degree between clusters is proposed in the merging stage. This method iteratively merges subclusters with maximum similarity by calculating the interaction degree of shared k-nearest neighbors between neighboring subclusters. The proposed similarity measure is effective in dealing with the problems of high overlap between clusters and ambiguous boundaries. The proposed algorithm is tested in detail on 10 synthetic datasets and 10 UCI real datasets and compared with existing state-of-the-art algorithms. The experimental results show that the CPDD-ID algorithm accurately identifies potential cluster structures and exhibits excellent performance in terms of both clustering accuracy.
ISSN:2076-3417