A Robust Technique for Closed Frequent and High Utility Itemsets Mining: Closed-FHUIM
Frequent itemset mining (FIM) and high utility itemset mining (HUIM) are popular data mining techniques used in various real-world applications such as retail-market, bio-medicine, and click-stream analysis. However, these techniques have certain limitations. Support, defined as the frequency of an...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10810425/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Frequent itemset mining (FIM) and high utility itemset mining (HUIM) are popular data mining techniques used in various real-world applications such as retail-market, bio-medicine, and click-stream analysis. However, these techniques have certain limitations. Support, defined as the frequency of an itemset in the database, is ignored in HUIM, leading to the omission of frequently occurring itemsets. Similarly, utility measure, which quantifies the importance or profit of an itemset, is overlooked in FIM, resulting in the inability to identify high utility itemsets. Additionally, current approaches often generate an extensive set of itemsets, resulting in redundancy and increased computational and memory demands. To address these challenges, this paper presents the Closed Frequent and High Utility Itemset Miner (Closed-FHUIM) algorithm, which concurrently performs both frequent and high utility itemset mining and produces a concise list of itemsets, reducing redundancy and optimizing efficiency. In Closed-FHUIM, we introduce a novel pruning technique that balances utility and support, and we adjust the sub-tree utility concept by incorporating the support measure. These techniques minimize computational resource use while ensuring that the itemsets meet both frequency and utility requirements. We evaluate our proposed approach on different sparse, dense, and very large datasets. Experimental results show that our algorithm outperforms existing closure-based state-of-the-art algorithms by up to two orders of magnitude while consuming significantly less memory. |
---|---|
ISSN: | 2169-3536 |