Application of K-means supported by clustered systems in big data association rule mining

Abstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These m...

Full description

Saved in:
Bibliographic Details
Main Author: Lihua Liu
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Systems and Soft Computing
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772941925000298
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850153974120841216
author Lihua Liu
author_facet Lihua Liu
author_sort Lihua Liu
collection DOAJ
description Abstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These methods are unable to leverage the advantages of modern distributed computing frameworks, resulting in more significant performance bottlenecks when processing large-scale datasets. Therefore, research on how to combine distributed computing technology with association rule mining has become the key to improving efficiency and scalability. To this end, the study introduced a parallel frequent itemset mining technique, FiDoop DP, which used the MapReduce programming paradigm for data partitioning on Hadoop clusters and integrates an improved k-means++ algorithm for data preprocessing to provide better data processing results. The findings indicated that the enhanced k-means++ clustering method achieved a Davies-Bouldin index of 0.642 for performance validation, while its Calinski-Harabasz score reached 5186. The improved k-means++ clustering technique showed advantageous clustering results, while the data partitioning method based on frequent item set parallel mining shown a notable performance advantage. With 60 seed points, the execution time for the frequent item set parallel mining technique was just 683 s, the mining duration was only 402 s, and the shuffling expenditure amounted to 2280GB. This indicates that the FiDoop DP method proposed by the study has significant importance in modern cluster environments. By combining the distributed computing capabilities of Hadoop clusters with the improved k-means++ clustering algorithm, this method effectively solves the scalability problem in processing large datasets and significantly improves the efficiency of clustering analysis and frequent itemset mining.
format Article
id doaj-art-19f86883e1eb43e0acbf4af39296f61d
institution OA Journals
issn 2772-9419
language English
publishDate 2025-12-01
publisher Elsevier
record_format Article
series Systems and Soft Computing
spelling doaj-art-19f86883e1eb43e0acbf4af39296f61d2025-08-20T02:25:35ZengElsevierSystems and Soft Computing2772-94192025-12-01720021110.1016/j.sasc.2025.200211Application of K-means supported by clustered systems in big data association rule miningLihua Liu0Software Engineering Department, Hebei Software Institute, Baoding 071000, ChinaAbstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These methods are unable to leverage the advantages of modern distributed computing frameworks, resulting in more significant performance bottlenecks when processing large-scale datasets. Therefore, research on how to combine distributed computing technology with association rule mining has become the key to improving efficiency and scalability. To this end, the study introduced a parallel frequent itemset mining technique, FiDoop DP, which used the MapReduce programming paradigm for data partitioning on Hadoop clusters and integrates an improved k-means++ algorithm for data preprocessing to provide better data processing results. The findings indicated that the enhanced k-means++ clustering method achieved a Davies-Bouldin index of 0.642 for performance validation, while its Calinski-Harabasz score reached 5186. The improved k-means++ clustering technique showed advantageous clustering results, while the data partitioning method based on frequent item set parallel mining shown a notable performance advantage. With 60 seed points, the execution time for the frequent item set parallel mining technique was just 683 s, the mining duration was only 402 s, and the shuffling expenditure amounted to 2280GB. This indicates that the FiDoop DP method proposed by the study has significant importance in modern cluster environments. By combining the distributed computing capabilities of Hadoop clusters with the improved k-means++ clustering algorithm, this method effectively solves the scalability problem in processing large datasets and significantly improves the efficiency of clustering analysis and frequent itemset mining.http://www.sciencedirect.com/science/article/pii/S2772941925000298Cluster systemsK-meansAssociation rulesFrequent item setsMining algorithms
spellingShingle Lihua Liu
Application of K-means supported by clustered systems in big data association rule mining
Systems and Soft Computing
Cluster systems
K-means
Association rules
Frequent item sets
Mining algorithms
title Application of K-means supported by clustered systems in big data association rule mining
title_full Application of K-means supported by clustered systems in big data association rule mining
title_fullStr Application of K-means supported by clustered systems in big data association rule mining
title_full_unstemmed Application of K-means supported by clustered systems in big data association rule mining
title_short Application of K-means supported by clustered systems in big data association rule mining
title_sort application of k means supported by clustered systems in big data association rule mining
topic Cluster systems
K-means
Association rules
Frequent item sets
Mining algorithms
url http://www.sciencedirect.com/science/article/pii/S2772941925000298
work_keys_str_mv AT lihualiu applicationofkmeanssupportedbyclusteredsystemsinbigdataassociationrulemining