Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm

Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although...

Full description

Saved in:
Bibliographic Details
Main Authors: Noemí DeCastro-García, Ángel Luis Muñoz Castañeda, David Escudero García, Miguel V. Carriegos
Format: Article
Language:English
Published: Wiley 2019-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2019/6278908
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849305917196075008
author Noemí DeCastro-García
Ángel Luis Muñoz Castañeda
David Escudero García
Miguel V. Carriegos
author_facet Noemí DeCastro-García
Ángel Luis Muñoz Castañeda
David Escudero García
Miguel V. Carriegos
author_sort Noemí DeCastro-García
collection DOAJ
description Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets.
format Article
id doaj-art-357ae8efb8124b17a368a76fc4c24731
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2019-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-357ae8efb8124b17a368a76fc4c247312025-08-20T03:55:16ZengWileyComplexity1076-27871099-05262019-01-01201910.1155/2019/62789086278908Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning AlgorithmNoemí DeCastro-García0Ángel Luis Muñoz Castañeda1David Escudero García2Miguel V. Carriegos3Departamento de Matemáticas, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainResearch Institute on Applied Sciences in Cybersecurity, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainResearch Institute on Applied Sciences in Cybersecurity, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainDepartamento de Matemáticas, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainSelecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets.http://dx.doi.org/10.1155/2019/6278908
spellingShingle Noemí DeCastro-García
Ángel Luis Muñoz Castañeda
David Escudero García
Miguel V. Carriegos
Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
Complexity
title Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
title_full Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
title_fullStr Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
title_full_unstemmed Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
title_short Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
title_sort effect of the sampling of a dataset in the hyperparameter optimization phase over the efficiency of a machine learning algorithm
url http://dx.doi.org/10.1155/2019/6278908
work_keys_str_mv AT noemidecastrogarcia effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm
AT angelluismunozcastaneda effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm
AT davidescuderogarcia effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm
AT miguelvcarriegos effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm