Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift

In this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and sec...

Full description

Saved in:

Bibliographic Details
Main Authors:	M. V. Carriegos, N. DeCastro-García, D. Escudero
Format:	Article
Language:	English
Published:	Wiley 2023-01-01
Series:	Computational and Mathematical Methods
Online Access:	http://dx.doi.org/10.1155/2023/5780357
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832556690574147584
author	M. V. Carriegos N. DeCastro-García D. Escudero
author_facet	M. V. Carriegos N. DeCastro-García D. Escudero
author_sort	M. V. Carriegos
collection	DOAJ
description	In this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and secondly, we have parallelized the machine learning algorithm to study the effect of using a different number of cores for the problem. We have classified the cybersecurity events’ maliciousness in a biclass and a multiclass scenario. All the experiments have been carried out with a well-known optimal set of features: the geolocation information of the IP address. However, the geolocation features of an IP address can change over time. Also, the relation between the IP address and its label of maliciousness can be modified if we test the address several times. Then, the models’ performance could degrade because the information acquired from training on past samples may not generalize well to new samples. This situation is known as concept drift. For this reason, it is necessary to study if the optimization proposed works in a concept drift scenario. The results show that the concept drift does not degrade the models. Also, boosting algorithms achieving competitive or better performance compared to similar research works for the biclass scenario and an effective categorization for the multiclass case. The best efficient setting is reached using five nodes regarding high-performance computation resources.
format	Article
id	doaj-art-90feca9bb262434a9718472a200eae06
institution	Kabale University
issn	2577-7408
language	English
publishDate	2023-01-01
publisher	Wiley
record_format	Article
series	Computational and Mathematical Methods
spelling	doaj-art-90feca9bb262434a9718472a200eae062025-02-03T05:44:35ZengWileyComputational and Mathematical Methods2577-74082023-01-01202310.1155/2023/5780357Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept DriftM. V. Carriegos0N. DeCastro-García1D. Escudero2Departamento de MatemáticasDepartamento de MatemáticasRIASCIn this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and secondly, we have parallelized the machine learning algorithm to study the effect of using a different number of cores for the problem. We have classified the cybersecurity events’ maliciousness in a biclass and a multiclass scenario. All the experiments have been carried out with a well-known optimal set of features: the geolocation information of the IP address. However, the geolocation features of an IP address can change over time. Also, the relation between the IP address and its label of maliciousness can be modified if we test the address several times. Then, the models’ performance could degrade because the information acquired from training on past samples may not generalize well to new samples. This situation is known as concept drift. For this reason, it is necessary to study if the optimization proposed works in a concept drift scenario. The results show that the concept drift does not degrade the models. Also, boosting algorithms achieving competitive or better performance compared to similar research works for the biclass scenario and an effective categorization for the multiclass case. The best efficient setting is reached using five nodes regarding high-performance computation resources.http://dx.doi.org/10.1155/2023/5780357
spellingShingle	M. V. Carriegos N. DeCastro-García D. Escudero Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift Computational and Mathematical Methods
title	Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_full	Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_fullStr	Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_full_unstemmed	Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_short	Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift
title_sort	towards supercomputing categorizing the maliciousness upon cybersecurity blacklists with concept drift
url	http://dx.doi.org/10.1155/2023/5780357
work_keys_str_mv	AT mvcarriegos towardssupercomputingcategorizingthemaliciousnessuponcybersecurityblacklistswithconceptdrift AT ndecastrogarcia towardssupercomputingcategorizingthemaliciousnessuponcybersecurityblacklistswithconceptdrift AT descudero towardssupercomputingcategorizingthemaliciousnessuponcybersecurityblacklistswithconceptdrift

Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift

Similar Items