Adaptive micro partition and hierarchical merging for accurate mixed data clustering
Abstract Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To addr...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2024-12-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01695-7 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571144053456896 |
---|---|
author | Yunfan Zhang Rong Zou Yiqun Zhang Yue Zhang Yiu-ming Cheung Kangshun Li |
author_facet | Yunfan Zhang Rong Zou Yiqun Zhang Yue Zhang Yiu-ming Cheung Kangshun Li |
author_sort | Yunfan Zhang |
collection | DOAJ |
description | Abstract Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority. |
format | Article |
id | doaj-art-0d8b4d539af24807b98b48e80864dff2 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2024-12-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-0d8b4d539af24807b98b48e80864dff22025-02-02T12:49:41ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111410.1007/s40747-024-01695-7Adaptive micro partition and hierarchical merging for accurate mixed data clusteringYunfan Zhang0Rong Zou1Yiqun Zhang2Yue Zhang3Yiu-ming Cheung4Kangshun Li5School of Computer Science and Technology, Guangdong University of TechnologyDepartment of Computer Science, Hong Kong Baptist UniversitySchool of Computer Science and Technology, Guangdong University of TechnologySchool of Computer Science, Guangdong Polytechnic Normal UniversityDepartment of Computer Science, Hong Kong Baptist UniversityCollege of Mathematics and Informatics, South China Agricultural UniversityAbstract Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.https://doi.org/10.1007/s40747-024-01695-7Cluster analysisHeterogeneous attributesNeighborhood rough setUnsupervised learning |
spellingShingle | Yunfan Zhang Rong Zou Yiqun Zhang Yue Zhang Yiu-ming Cheung Kangshun Li Adaptive micro partition and hierarchical merging for accurate mixed data clustering Complex & Intelligent Systems Cluster analysis Heterogeneous attributes Neighborhood rough set Unsupervised learning |
title | Adaptive micro partition and hierarchical merging for accurate mixed data clustering |
title_full | Adaptive micro partition and hierarchical merging for accurate mixed data clustering |
title_fullStr | Adaptive micro partition and hierarchical merging for accurate mixed data clustering |
title_full_unstemmed | Adaptive micro partition and hierarchical merging for accurate mixed data clustering |
title_short | Adaptive micro partition and hierarchical merging for accurate mixed data clustering |
title_sort | adaptive micro partition and hierarchical merging for accurate mixed data clustering |
topic | Cluster analysis Heterogeneous attributes Neighborhood rough set Unsupervised learning |
url | https://doi.org/10.1007/s40747-024-01695-7 |
work_keys_str_mv | AT yunfanzhang adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering AT rongzou adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering AT yiqunzhang adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering AT yuezhang adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering AT yiumingcheung adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering AT kangshunli adaptivemicropartitionandhierarchicalmergingforaccuratemixeddataclustering |