Frequent-pattern discovering algorithm for large-scale corpus
A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary res...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2007-01-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/74656928/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841537441954004992 |
---|---|
author | GONG Cai-chun1 HE Min1 CHEN Hai-qiang1 XU Hong-bo1 CHENG Xue-qi1 |
author_facet | GONG Cai-chun1 HE Min1 CHEN Hai-qiang1 XU Hong-bo1 CHENG Xue-qi1 |
author_sort | GONG Cai-chun1 |
collection | DOAJ |
description | A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second. |
format | Article |
id | doaj-art-42175ba11a2e477cac121910cd65b603 |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2007-01-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-42175ba11a2e477cac121910cd65b6032025-01-14T08:35:11ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2007-01-0116116674656928Frequent-pattern discovering algorithm for large-scale corpusGONG Cai-chun1HE Min1CHEN Hai-qiang1XU Hong-bo1CHENG Xue-qi1A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second.http://www.joconline.com.cn/zh/article/74656928/frequent patterncorpus partitionrepeat |
spellingShingle | GONG Cai-chun1 HE Min1 CHEN Hai-qiang1 XU Hong-bo1 CHENG Xue-qi1 Frequent-pattern discovering algorithm for large-scale corpus Tongxin xuebao frequent pattern corpus partition repeat |
title | Frequent-pattern discovering algorithm for large-scale corpus |
title_full | Frequent-pattern discovering algorithm for large-scale corpus |
title_fullStr | Frequent-pattern discovering algorithm for large-scale corpus |
title_full_unstemmed | Frequent-pattern discovering algorithm for large-scale corpus |
title_short | Frequent-pattern discovering algorithm for large-scale corpus |
title_sort | frequent pattern discovering algorithm for large scale corpus |
topic | frequent pattern corpus partition repeat |
url | http://www.joconline.com.cn/zh/article/74656928/ |
work_keys_str_mv | AT gongcaichun1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT hemin1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT chenhaiqiang1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT xuhongbo1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT chengxueqi1 frequentpatterndiscoveringalgorithmforlargescalecorpus |