Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift

In this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and sec...

Full description

Saved in:
Bibliographic Details
Main Authors: M. V. Carriegos, N. DeCastro-García, D. Escudero
Format: Article
Language:English
Published: Wiley 2023-01-01
Series:Computational and Mathematical Methods
Online Access:http://dx.doi.org/10.1155/2023/5780357
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832556690574147584
author M. V. Carriegos
N. DeCastro-García
D. Escudero
author_facet M. V. Carriegos
N. DeCastro-García
D. Escudero
author_sort M. V. Carriegos
collection DOAJ
description In this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and secondly, we have parallelized the machine learning algorithm to study the effect of using a different number of cores for the problem. We have classified the cybersecurity events’ maliciousness in a biclass and a multiclass scenario. All the experiments have been carried out with a well-known optimal set of features: the geolocation information of the IP address. However, the geolocation features of an IP address can change over time. Also, the relation between the IP address and its label of maliciousness can be modified if we test the address several times. Then, the models’ performance could degrade because the information acquired from training on past samples may not generalize well to new samples. This situation is known as concept drift. For this reason, it is necessary to study if the optimization proposed works in a concept drift scenario. The results show that the concept drift does not degrade the models. Also, boosting algorithms achieving competitive or better performance compared to similar research works for the biclass scenario and an effective categorization for the multiclass case. The best efficient setting is reached using five nodes regarding high-performance computation resources.
format Article
id doaj-art-90feca9bb262434a9718472a200eae06
institution Kabale University
issn 2577-7408
language English
publishDate 2023-01-01
publisher Wiley
record_format Article
series Computational and Mathematical Methods
spelling doaj-art-90feca9bb262434a9718472a200eae062025-02-03T05:44:35ZengWileyComputational and Mathematical Methods2577-74082023-01-01202310.1155/2023/5780357Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept DriftM. V. Carriegos0N. DeCastro-García1D. Escudero2Departamento de MatemáticasDepartamento de MatemáticasRIASCIn this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and secondly, we have parallelized the machine learning algorithm to study the effect of using a different number of cores for the problem. We have classified the cybersecurity events’ maliciousness in a biclass and a multiclass scenario. All the experiments have been carried out with a well-known optimal set of features: the geolocation information of the IP address. However, the geolocation features of an IP address can change over time. Also, the relation between the IP address and its label of maliciousness can be modified if we test the address several times. Then, the models’ performance could degrade because the information acquired from training on past samples may not generalize well to new samples. This situation is known as concept drift. For this reason, it is necessary to study if the optimization proposed works in a concept drift scenario. The results show that the concept drift does not degrade the models. Also, boosting algorithms achieving competitive or better performance compared to similar research works for the biclass scenario and an effective categorization for the multiclass case. The best efficient setting is reached using five nodes regarding high-performance computation resources.http://dx.doi.org/10.1155/2023/5780357
spellingShingle M. V. Carriegos
N. DeCastro-García
D. Escudero
Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
Computational and Mathematical Methods
title Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_full Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_fullStr Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_full_unstemmed Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_short Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_sort towards supercomputing categorizing the maliciousness upon cybersecurity blacklists with concept drift
url http://dx.doi.org/10.1155/2023/5780357
work_keys_str_mv AT mvcarriegos towardssupercomputingcategorizingthemaliciousnessuponcybersecurityblacklistswithconceptdrift
AT ndecastrogarcia towardssupercomputingcategorizingthemaliciousnessuponcybersecurityblacklistswithconceptdrift
AT descudero towardssupercomputingcategorizingthemaliciousnessuponcybersecurityblacklistswithconceptdrift