Classification of Textual E-Mail Spam Using Data Mining Techniques

A new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined by k-nearest neighbor algorithm. Ap...

Full description

Saved in:
Bibliographic Details
Main Authors: Rasim M. Alguliev, Ramiz M. Aliguliyev, Saadat A. Nazirova
Format: Article
Language:English
Published: Wiley 2011-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/2011/416308
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850110841635995648
author Rasim M. Alguliev
Ramiz M. Aliguliyev
Saadat A. Nazirova
author_facet Rasim M. Alguliev
Ramiz M. Aliguliyev
Saadat A. Nazirova
author_sort Rasim M. Alguliev
collection DOAJ
description A new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined by k-nearest neighbor algorithm. Application of genetic algorithm for solving constrained problems faces the problem of constant support of chromosomes which reduces convergence process. Therefore, for acceleration of convergence of genetic algorithm, a penalty function that prevents occurrence of infeasible chromosomes at ranging of values of function of fitness is used. After classification, knowledge extraction is applied in order to get information about classes. Multidocument summarization method is used to get the information portrait of each cluster of spam messages. Classifying and parametrizing spam templates, it will be also possible to define the thematic dependence from geographical dependence (e.g., what subjects prevail in spam messages sent from certain countries). Thus, the offered system will be capable to reveal purposeful information attacks if those occur. Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.
format Article
id doaj-art-44ecb6558e1d4f7bba19b1522ce39686
institution OA Journals
issn 1687-9724
1687-9732
language English
publishDate 2011-01-01
publisher Wiley
record_format Article
series Applied Computational Intelligence and Soft Computing
spelling doaj-art-44ecb6558e1d4f7bba19b1522ce396862025-08-20T02:37:46ZengWileyApplied Computational Intelligence and Soft Computing1687-97241687-97322011-01-01201110.1155/2011/416308416308Classification of Textual E-Mail Spam Using Data Mining TechniquesRasim M. Alguliev0Ramiz M. Aliguliyev1Saadat A. Nazirova2Institute of Information Technology of Azerbaijan National Academy of Sciences, 9 F. Agayev Street, Baku 1141, AzerbaijanInstitute of Information Technology of Azerbaijan National Academy of Sciences, 9 F. Agayev Street, Baku 1141, AzerbaijanInstitute of Information Technology of Azerbaijan National Academy of Sciences, 9 F. Agayev Street, Baku 1141, AzerbaijanA new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined by k-nearest neighbor algorithm. Application of genetic algorithm for solving constrained problems faces the problem of constant support of chromosomes which reduces convergence process. Therefore, for acceleration of convergence of genetic algorithm, a penalty function that prevents occurrence of infeasible chromosomes at ranging of values of function of fitness is used. After classification, knowledge extraction is applied in order to get information about classes. Multidocument summarization method is used to get the information portrait of each cluster of spam messages. Classifying and parametrizing spam templates, it will be also possible to define the thematic dependence from geographical dependence (e.g., what subjects prevail in spam messages sent from certain countries). Thus, the offered system will be capable to reveal purposeful information attacks if those occur. Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.http://dx.doi.org/10.1155/2011/416308
spellingShingle Rasim M. Alguliev
Ramiz M. Aliguliyev
Saadat A. Nazirova
Classification of Textual E-Mail Spam Using Data Mining Techniques
Applied Computational Intelligence and Soft Computing
title Classification of Textual E-Mail Spam Using Data Mining Techniques
title_full Classification of Textual E-Mail Spam Using Data Mining Techniques
title_fullStr Classification of Textual E-Mail Spam Using Data Mining Techniques
title_full_unstemmed Classification of Textual E-Mail Spam Using Data Mining Techniques
title_short Classification of Textual E-Mail Spam Using Data Mining Techniques
title_sort classification of textual e mail spam using data mining techniques
url http://dx.doi.org/10.1155/2011/416308
work_keys_str_mv AT rasimmalguliev classificationoftextualemailspamusingdataminingtechniques
AT ramizmaliguliyev classificationoftextualemailspamusingdataminingtechniques
AT saadatanazirova classificationoftextualemailspamusingdataminingtechniques