Frequent-pattern discovering algorithm for large-scale corpus

A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary res...

Full description

Saved in:
Bibliographic Details
Main Authors: GONG Cai-chun1, HE Min1, CHEN Hai-qiang1, XU Hong-bo1, CHENG Xue-qi1
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2007-01-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/74656928/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841537441954004992
author GONG Cai-chun1
HE Min1
CHEN Hai-qiang1
XU Hong-bo1
CHENG Xue-qi1
author_facet GONG Cai-chun1
HE Min1
CHEN Hai-qiang1
XU Hong-bo1
CHENG Xue-qi1
author_sort GONG Cai-chun1
collection DOAJ
description A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second.
format Article
id doaj-art-42175ba11a2e477cac121910cd65b603
institution Kabale University
issn 1000-436X
language zho
publishDate 2007-01-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-42175ba11a2e477cac121910cd65b6032025-01-14T08:35:11ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2007-01-0116116674656928Frequent-pattern discovering algorithm for large-scale corpusGONG Cai-chun1HE Min1CHEN Hai-qiang1XU Hong-bo1CHENG Xue-qi1A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second.http://www.joconline.com.cn/zh/article/74656928/frequent patterncorpus partitionrepeat
spellingShingle GONG Cai-chun1
HE Min1
CHEN Hai-qiang1
XU Hong-bo1
CHENG Xue-qi1
Frequent-pattern discovering algorithm for large-scale corpus
Tongxin xuebao
frequent pattern
corpus partition
repeat
title Frequent-pattern discovering algorithm for large-scale corpus
title_full Frequent-pattern discovering algorithm for large-scale corpus
title_fullStr Frequent-pattern discovering algorithm for large-scale corpus
title_full_unstemmed Frequent-pattern discovering algorithm for large-scale corpus
title_short Frequent-pattern discovering algorithm for large-scale corpus
title_sort frequent pattern discovering algorithm for large scale corpus
topic frequent pattern
corpus partition
repeat
url http://www.joconline.com.cn/zh/article/74656928/
work_keys_str_mv AT gongcaichun1 frequentpatterndiscoveringalgorithmforlargescalecorpus
AT hemin1 frequentpatterndiscoveringalgorithmforlargescalecorpus
AT chenhaiqiang1 frequentpatterndiscoveringalgorithmforlargescalecorpus
AT xuhongbo1 frequentpatterndiscoveringalgorithmforlargescalecorpus
AT chengxueqi1 frequentpatterndiscoveringalgorithmforlargescalecorpus