Study on Chinese spam filtering system based on Bayes algorithm

In view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Fu...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoran LIU, Pan DING, Changjiang GUO, Jinfeng CHANG, Jingchuang CUI
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2018-12-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000−436x.2018281/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539403839700992
author Haoran LIU
Pan DING
Changjiang GUO
Jinfeng CHANG
Jingchuang CUI
author_facet Haoran LIU
Pan DING
Changjiang GUO
Jinfeng CHANG
Jingchuang CUI
author_sort Haoran LIU
collection DOAJ
description In view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Further, a three-layer structure model based on GWO_GA structure learning algorithm was proposed to expand the limit of text features and improve the diversity of text features. The new structure learning algorithm relaxes the conditional independence assumption of feature properties. A fine classification layer was added between class layer and feature layer to increase feature coverage. The experiment demonstrates that the three-layer Bayesian network algorithm with TF-IDF feature extraction based on the central word extension and GWO_GA structure learning improves the effect of Chinese spam filtering.
format Article
id doaj-art-10b13846cb364f30af974c1510c0cad9
institution Kabale University
issn 1000-436X
language zho
publishDate 2018-12-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-10b13846cb364f30af974c1510c0cad92025-01-14T07:16:01ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2018-12-013915115959722537Study on Chinese spam filtering system based on Bayes algorithmHaoran LIUPan DINGChangjiang GUOJinfeng CHANGJingchuang CUIIn view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Further, a three-layer structure model based on GWO_GA structure learning algorithm was proposed to expand the limit of text features and improve the diversity of text features. The new structure learning algorithm relaxes the conditional independence assumption of feature properties. A fine classification layer was added between class layer and feature layer to increase feature coverage. The experiment demonstrates that the three-layer Bayesian network algorithm with TF-IDF feature extraction based on the central word extension and GWO_GA structure learning improves the effect of Chinese spam filtering.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000−436x.2018281/Bayesian networkTF-IDFgenetic algorithmshort text classificationChinese spam filtering
spellingShingle Haoran LIU
Pan DING
Changjiang GUO
Jinfeng CHANG
Jingchuang CUI
Study on Chinese spam filtering system based on Bayes algorithm
Tongxin xuebao
Bayesian network
TF-IDF
genetic algorithm
short text classification
Chinese spam filtering
title Study on Chinese spam filtering system based on Bayes algorithm
title_full Study on Chinese spam filtering system based on Bayes algorithm
title_fullStr Study on Chinese spam filtering system based on Bayes algorithm
title_full_unstemmed Study on Chinese spam filtering system based on Bayes algorithm
title_short Study on Chinese spam filtering system based on Bayes algorithm
title_sort study on chinese spam filtering system based on bayes algorithm
topic Bayesian network
TF-IDF
genetic algorithm
short text classification
Chinese spam filtering
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000−436x.2018281/
work_keys_str_mv AT haoranliu studyonchinesespamfilteringsystembasedonbayesalgorithm
AT panding studyonchinesespamfilteringsystembasedonbayesalgorithm
AT changjiangguo studyonchinesespamfilteringsystembasedonbayesalgorithm
AT jinfengchang studyonchinesespamfilteringsystembasedonbayesalgorithm
AT jingchuangcui studyonchinesespamfilteringsystembasedonbayesalgorithm