Application of K-means supported by clustered systems in big data association rule mining

Abstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These m...

Full description

Saved in:

Bibliographic Details
Main Author:	Lihua Liu
Format:	Article
Language:	English
Published:	Elsevier 2025-12-01
Series:	Systems and Soft Computing
Subjects:	Cluster systems K-means Association rules Frequent item sets Mining algorithms
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772941925000298
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850153974120841216
author	Lihua Liu
author_facet	Lihua Liu
author_sort	Lihua Liu
collection	DOAJ
description	Abstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These methods are unable to leverage the advantages of modern distributed computing frameworks, resulting in more significant performance bottlenecks when processing large-scale datasets. Therefore, research on how to combine distributed computing technology with association rule mining has become the key to improving efficiency and scalability. To this end, the study introduced a parallel frequent itemset mining technique, FiDoop DP, which used the MapReduce programming paradigm for data partitioning on Hadoop clusters and integrates an improved k-means++ algorithm for data preprocessing to provide better data processing results. The findings indicated that the enhanced k-means++ clustering method achieved a Davies-Bouldin index of 0.642 for performance validation, while its Calinski-Harabasz score reached 5186. The improved k-means++ clustering technique showed advantageous clustering results, while the data partitioning method based on frequent item set parallel mining shown a notable performance advantage. With 60 seed points, the execution time for the frequent item set parallel mining technique was just 683 s, the mining duration was only 402 s, and the shuffling expenditure amounted to 2280GB. This indicates that the FiDoop DP method proposed by the study has significant importance in modern cluster environments. By combining the distributed computing capabilities of Hadoop clusters with the improved k-means++ clustering algorithm, this method effectively solves the scalability problem in processing large datasets and significantly improves the efficiency of clustering analysis and frequent itemset mining.
format	Article
id	doaj-art-19f86883e1eb43e0acbf4af39296f61d
institution	OA Journals
issn	2772-9419
language	English
publishDate	2025-12-01
publisher	Elsevier
record_format	Article
series	Systems and Soft Computing
spelling	doaj-art-19f86883e1eb43e0acbf4af39296f61d2025-08-20T02:25:35ZengElsevierSystems and Soft Computing2772-94192025-12-01720021110.1016/j.sasc.2025.200211Application of K-means supported by clustered systems in big data association rule miningLihua Liu0Software Engineering Department, Hebei Software Institute, Baoding 071000, ChinaAbstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These methods are unable to leverage the advantages of modern distributed computing frameworks, resulting in more significant performance bottlenecks when processing large-scale datasets. Therefore, research on how to combine distributed computing technology with association rule mining has become the key to improving efficiency and scalability. To this end, the study introduced a parallel frequent itemset mining technique, FiDoop DP, which used the MapReduce programming paradigm for data partitioning on Hadoop clusters and integrates an improved k-means++ algorithm for data preprocessing to provide better data processing results. The findings indicated that the enhanced k-means++ clustering method achieved a Davies-Bouldin index of 0.642 for performance validation, while its Calinski-Harabasz score reached 5186. The improved k-means++ clustering technique showed advantageous clustering results, while the data partitioning method based on frequent item set parallel mining shown a notable performance advantage. With 60 seed points, the execution time for the frequent item set parallel mining technique was just 683 s, the mining duration was only 402 s, and the shuffling expenditure amounted to 2280GB. This indicates that the FiDoop DP method proposed by the study has significant importance in modern cluster environments. By combining the distributed computing capabilities of Hadoop clusters with the improved k-means++ clustering algorithm, this method effectively solves the scalability problem in processing large datasets and significantly improves the efficiency of clustering analysis and frequent itemset mining.http://www.sciencedirect.com/science/article/pii/S2772941925000298Cluster systemsK-meansAssociation rulesFrequent item setsMining algorithms
spellingShingle	Lihua Liu Application of K-means supported by clustered systems in big data association rule mining Systems and Soft Computing Cluster systems K-means Association rules Frequent item sets Mining algorithms
title	Application of K-means supported by clustered systems in big data association rule mining
title_full	Application of K-means supported by clustered systems in big data association rule mining
title_fullStr	Application of K-means supported by clustered systems in big data association rule mining
title_full_unstemmed	Application of K-means supported by clustered systems in big data association rule mining
title_short	Application of K-means supported by clustered systems in big data association rule mining
title_sort	application of k means supported by clustered systems in big data association rule mining
topic	Cluster systems K-means Association rules Frequent item sets Mining algorithms
url	http://www.sciencedirect.com/science/article/pii/S2772941925000298
work_keys_str_mv	AT lihualiu applicationofkmeanssupportedbyclusteredsystemsinbigdataassociationrulemining

Application of K-means supported by clustered systems in big data association rule mining

Similar Items