The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software...

Full description

Saved in:
Bibliographic Details
Main Authors: Isaac C. Ferreira, Marcelo V. C. Aragao, Edvard M. Oliveira, Bruno T. Kuehne, Edmilson M. Moreira, Otavio A. S. Carpinteiro
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9565223/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajubá, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.
ISSN:2169-3536