Frequent-pattern discovering algorithm for large-scale corpus

A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary res...

Full description

Saved in:
Bibliographic Details
Main Authors: GONG Cai-chun1, HE Min1, CHEN Hai-qiang1, XU Hong-bo1, CHENG Xue-qi1
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2007-01-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/74656928/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second.
ISSN:1000-436X