Application of K-means supported by clustered systems in big data association rule mining
Abstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These m...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-12-01
|
| Series: | Systems and Soft Computing |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772941925000298 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850153974120841216 |
|---|---|
| author | Lihua Liu |
| author_facet | Lihua Liu |
| author_sort | Lihua Liu |
| collection | DOAJ |
| description | Abstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These methods are unable to leverage the advantages of modern distributed computing frameworks, resulting in more significant performance bottlenecks when processing large-scale datasets. Therefore, research on how to combine distributed computing technology with association rule mining has become the key to improving efficiency and scalability. To this end, the study introduced a parallel frequent itemset mining technique, FiDoop DP, which used the MapReduce programming paradigm for data partitioning on Hadoop clusters and integrates an improved k-means++ algorithm for data preprocessing to provide better data processing results. The findings indicated that the enhanced k-means++ clustering method achieved a Davies-Bouldin index of 0.642 for performance validation, while its Calinski-Harabasz score reached 5186. The improved k-means++ clustering technique showed advantageous clustering results, while the data partitioning method based on frequent item set parallel mining shown a notable performance advantage. With 60 seed points, the execution time for the frequent item set parallel mining technique was just 683 s, the mining duration was only 402 s, and the shuffling expenditure amounted to 2280GB. This indicates that the FiDoop DP method proposed by the study has significant importance in modern cluster environments. By combining the distributed computing capabilities of Hadoop clusters with the improved k-means++ clustering algorithm, this method effectively solves the scalability problem in processing large datasets and significantly improves the efficiency of clustering analysis and frequent itemset mining. |
| format | Article |
| id | doaj-art-19f86883e1eb43e0acbf4af39296f61d |
| institution | OA Journals |
| issn | 2772-9419 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Systems and Soft Computing |
| spelling | doaj-art-19f86883e1eb43e0acbf4af39296f61d2025-08-20T02:25:35ZengElsevierSystems and Soft Computing2772-94192025-12-01720021110.1016/j.sasc.2025.200211Application of K-means supported by clustered systems in big data association rule miningLihua Liu0Software Engineering Department, Hebei Software Institute, Baoding 071000, ChinaAbstracts: Association rule mining plays an important role in the field of data mining, which is used to discover hidden relationships. However, as data volumes increase, traditional association rule mining methods are constrained to single-machine computing when processing large-scale data. These methods are unable to leverage the advantages of modern distributed computing frameworks, resulting in more significant performance bottlenecks when processing large-scale datasets. Therefore, research on how to combine distributed computing technology with association rule mining has become the key to improving efficiency and scalability. To this end, the study introduced a parallel frequent itemset mining technique, FiDoop DP, which used the MapReduce programming paradigm for data partitioning on Hadoop clusters and integrates an improved k-means++ algorithm for data preprocessing to provide better data processing results. The findings indicated that the enhanced k-means++ clustering method achieved a Davies-Bouldin index of 0.642 for performance validation, while its Calinski-Harabasz score reached 5186. The improved k-means++ clustering technique showed advantageous clustering results, while the data partitioning method based on frequent item set parallel mining shown a notable performance advantage. With 60 seed points, the execution time for the frequent item set parallel mining technique was just 683 s, the mining duration was only 402 s, and the shuffling expenditure amounted to 2280GB. This indicates that the FiDoop DP method proposed by the study has significant importance in modern cluster environments. By combining the distributed computing capabilities of Hadoop clusters with the improved k-means++ clustering algorithm, this method effectively solves the scalability problem in processing large datasets and significantly improves the efficiency of clustering analysis and frequent itemset mining.http://www.sciencedirect.com/science/article/pii/S2772941925000298Cluster systemsK-meansAssociation rulesFrequent item setsMining algorithms |
| spellingShingle | Lihua Liu Application of K-means supported by clustered systems in big data association rule mining Systems and Soft Computing Cluster systems K-means Association rules Frequent item sets Mining algorithms |
| title | Application of K-means supported by clustered systems in big data association rule mining |
| title_full | Application of K-means supported by clustered systems in big data association rule mining |
| title_fullStr | Application of K-means supported by clustered systems in big data association rule mining |
| title_full_unstemmed | Application of K-means supported by clustered systems in big data association rule mining |
| title_short | Application of K-means supported by clustered systems in big data association rule mining |
| title_sort | application of k means supported by clustered systems in big data association rule mining |
| topic | Cluster systems K-means Association rules Frequent item sets Mining algorithms |
| url | http://www.sciencedirect.com/science/article/pii/S2772941925000298 |
| work_keys_str_mv | AT lihualiu applicationofkmeanssupportedbyclusteredsystemsinbigdataassociationrulemining |