Frequent-pattern discovering algorithm for large-scale corpus
A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary res...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2007-01-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/74656928/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second. |
---|---|
ISSN: | 1000-436X |