Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm
Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2019-01-01
|
| Series: | Complexity |
| Online Access: | http://dx.doi.org/10.1155/2019/6278908 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849305917196075008 |
|---|---|
| author | Noemí DeCastro-García Ángel Luis Muñoz Castañeda David Escudero García Miguel V. Carriegos |
| author_facet | Noemí DeCastro-García Ángel Luis Muñoz Castañeda David Escudero García Miguel V. Carriegos |
| author_sort | Noemí DeCastro-García |
| collection | DOAJ |
| description | Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets. |
| format | Article |
| id | doaj-art-357ae8efb8124b17a368a76fc4c24731 |
| institution | Kabale University |
| issn | 1076-2787 1099-0526 |
| language | English |
| publishDate | 2019-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Complexity |
| spelling | doaj-art-357ae8efb8124b17a368a76fc4c247312025-08-20T03:55:16ZengWileyComplexity1076-27871099-05262019-01-01201910.1155/2019/62789086278908Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning AlgorithmNoemí DeCastro-García0Ángel Luis Muñoz Castañeda1David Escudero García2Miguel V. Carriegos3Departamento de Matemáticas, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainResearch Institute on Applied Sciences in Cybersecurity, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainResearch Institute on Applied Sciences in Cybersecurity, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainDepartamento de Matemáticas, Universidad de León, Campus de Vegazana s/n, 24071 León, SpainSelecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets.http://dx.doi.org/10.1155/2019/6278908 |
| spellingShingle | Noemí DeCastro-García Ángel Luis Muñoz Castañeda David Escudero García Miguel V. Carriegos Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm Complexity |
| title | Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm |
| title_full | Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm |
| title_fullStr | Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm |
| title_full_unstemmed | Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm |
| title_short | Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm |
| title_sort | effect of the sampling of a dataset in the hyperparameter optimization phase over the efficiency of a machine learning algorithm |
| url | http://dx.doi.org/10.1155/2019/6278908 |
| work_keys_str_mv | AT noemidecastrogarcia effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm AT angelluismunozcastaneda effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm AT davidescuderogarcia effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm AT miguelvcarriegos effectofthesamplingofadatasetinthehyperparameteroptimizationphaseovertheefficiencyofamachinelearningalgorithm |