Frequent-pattern discovering algorithm for large-scale corpus

A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary res...

Full description

Saved in:

Bibliographic Details
Main Authors:	GONG Cai-chun1, HE Min1, CHEN Hai-qiang1, XU Hong-bo1, CHENG Xue-qi1
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2007-01-01
Series:	Tongxin xuebao
Subjects:	frequent pattern corpus partition repeat
Online Access:	http://www.joconline.com.cn/zh/article/74656928/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841537441954004992
author	GONG Cai-chun1 HE Min1 CHEN Hai-qiang1 XU Hong-bo1 CHENG Xue-qi1
author_facet	GONG Cai-chun1 HE Min1 CHEN Hai-qiang1 XU Hong-bo1 CHENG Xue-qi1
author_sort	GONG Cai-chun1
collection	DOAJ
description	A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second.
format	Article
id	doaj-art-42175ba11a2e477cac121910cd65b603
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2007-01-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-42175ba11a2e477cac121910cd65b6032025-01-14T08:35:11ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2007-01-0116116674656928Frequent-pattern discovering algorithm for large-scale corpusGONG Cai-chun1HE Min1CHEN Hai-qiang1XU Hong-bo1CHENG Xue-qi1A memory-based frequent-pattern discovering algorithm for large-scale corpus was presented.First,the origi-nal corpus was partitioned into several parts using appropriate dividing policy.Then each partition was processed inde-pendently to produce a temporary result,and the union of all temporary results is the final frequent-pattern set.The algo-rithm prunes a subtree once it is sure that none of the corresponding pattern will be frequent.Experiment shows that it takes no more than 1.6 gigabytes of memory to discover all patterns appearing more than 100 times for a 3.6 gigabytes news corpus,the average speed is 3.28 magabytes per second.http://www.joconline.com.cn/zh/article/74656928/frequent patterncorpus partitionrepeat
spellingShingle	GONG Cai-chun1 HE Min1 CHEN Hai-qiang1 XU Hong-bo1 CHENG Xue-qi1 Frequent-pattern discovering algorithm for large-scale corpus Tongxin xuebao frequent pattern corpus partition repeat
title	Frequent-pattern discovering algorithm for large-scale corpus
title_full	Frequent-pattern discovering algorithm for large-scale corpus
title_fullStr	Frequent-pattern discovering algorithm for large-scale corpus
title_full_unstemmed	Frequent-pattern discovering algorithm for large-scale corpus
title_short	Frequent-pattern discovering algorithm for large-scale corpus
title_sort	frequent pattern discovering algorithm for large scale corpus
topic	frequent pattern corpus partition repeat
url	http://www.joconline.com.cn/zh/article/74656928/
work_keys_str_mv	AT gongcaichun1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT hemin1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT chenhaiqiang1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT xuhongbo1 frequentpatterndiscoveringalgorithmforlargescalecorpus AT chengxueqi1 frequentpatterndiscoveringalgorithmforlargescalecorpus

Frequent-pattern discovering algorithm for large-scale corpus

Similar Items