Adaptive micro partition and hierarchical merging for accurate mixed data clustering

Abstract Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To addr...

Full description

Saved in:
Bibliographic Details
Main Authors: Yunfan Zhang, Rong Zou, Yiqun Zhang, Yue Zhang, Yiu-ming Cheung, Kangshun Li
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01695-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571144053456896
author Yunfan Zhang
Rong Zou
Yiqun Zhang
Yue Zhang
Yiu-ming Cheung
Kangshun Li
author_facet Yunfan Zhang
Rong Zou
Yiqun Zhang
Yue Zhang
Yiu-ming Cheung
Kangshun Li
author_sort Yunfan Zhang
collection DOAJ
description Abstract Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.
format Article
id doaj-art-0d8b4d539af24807b98b48e80864dff2
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2024-12-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-0d8b4d539af24807b98b48e80864dff22025-02-02T12:49:41ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111410.1007/s40747-024-01695-7Adaptive micro partition and hierarchical merging for accurate mixed data clusteringYunfan Zhang0Rong Zou1Yiqun Zhang2Yue Zhang3Yiu-ming Cheung4Kangshun Li5School of Computer Science and Technology, Guangdong University of TechnologyDepartment of Computer Science, Hong Kong Baptist UniversitySchool of Computer Science and Technology, Guangdong University of TechnologySchool of Computer Science, Guangdong Polytechnic Normal UniversityDepartment of Computer Science, Hong Kong Baptist UniversityCollege of Mathematics and Informatics, South China Agricultural UniversityAbstract Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.https://doi.org/10.1007/s40747-024-01695-7Cluster analysisHeterogeneous attributesNeighborhood rough setUnsupervised learning
spellingShingle Yunfan Zhang
Rong Zou
Yiqun Zhang
Yue Zhang
Yiu-ming Cheung
Kangshun Li
Adaptive micro partition and hierarchical merging for accurate mixed data clustering
Complex & Intelligent Systems
Cluster analysis
Heterogeneous attributes
Neighborhood rough set
Unsupervised learning
title Adaptive micro partition and hierarchical merging for accurate mixed data clustering
title_full Adaptive micro partition and hierarchical merging for accurate mixed data clustering
title_fullStr Adaptive micro partition and hierarchical merging for accurate mixed data clustering
title_full_unstemmed Adaptive micro partition and hierarchical merging for accurate mixed data clustering
title_short Adaptive micro partition and hierarchical merging for accurate mixed data clustering
title_sort adaptive micro partition and hierarchical merging for accurate mixed data clustering
topic Cluster analysis
Heterogeneous attributes
Neighborhood rough set
Unsupervised learning
url https://doi.org/10.1007/s40747-024-01695-7
work_keys_str_mv AT yunfanzhang adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering
AT rongzou adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering
AT yiqunzhang adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering
AT yuezhang adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering
AT yiumingcheung adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering
AT kangshunli adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering