DEALER: Distributed Clustering with Local Direction Centrality and Density Measure

Clustering by Measuring Local Direction Centrality (CDC) is a recently proposed innovative clustering method. It identifies clusters by assessing the direction centrality of data points, i.e., the distribution of their <i>k</i>-nearest neighbors. Although CDC has shown promising results,...

Full description

Saved in:
Bibliographic Details
Main Authors: Xuze Liu, Ziqi Zhao, Yuhai Zhao
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3988
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering by Measuring Local Direction Centrality (CDC) is a recently proposed innovative clustering method. It identifies clusters by assessing the direction centrality of data points, i.e., the distribution of their <i>k</i>-nearest neighbors. Although CDC has shown promising results, it still faces challenges in terms of both effectiveness and efficiency. In this paper, we propose a novel algorithm, Distributed Clustering with Local Direction Centrality and Density Measure (DEALER). DEALER addresses the problem of weak connectivity by using a well-designed hybrid metric of direction centrality and density. In contrast to traditional density-based methods, this metric does not require a user-specified neighborhood radius, thus alleviating the parameter-setting burden on the user. Further, we propose a distributed clustering technique empowered by <i>z</i>-value filtering, which significantly reduces the cost of <i>k</i>-nearest neighbor computations in the direction centrality metric, lowering the time complexity from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mn>2</mn></msup><mo>)</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><mi>n</mi><mo form="prefix">log</mo><mi>n</mi><mo>)</mo></mrow></semantics></math></inline-formula>. Extensive experiments on both real and synthetic datasets validate the effectiveness and efficiency of our proposed DEALER algorithm.
ISSN:2076-3417