An automated parallel genetic algorithm with parametric adaptation for distributed data analysis

Abstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhanc...

Full description

Saved in:
Bibliographic Details
Main Authors: Laila Al-Terkawi, Matteo Migliavacca
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-93943-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849392201319055360
author Laila Al-Terkawi
Matteo Migliavacca
author_facet Laila Al-Terkawi
Matteo Migliavacca
author_sort Laila Al-Terkawi
collection DOAJ
description Abstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhancements to boost performance further. This study proposes the integration of automatic termination and population sizing mechanisms into parallel GAs to augment their flexibility and effectiveness. We extend PDMS-BioHEL and PDMD-BioHEL, two parallel GA-based classifiers implemented on the Spark platform, and through extensive experimentation, demonstrate the efficacy of our approach in enhancing computational efficiency and user-friendliness. However, while these automated strategies significantly reduce the need for manual parameter tuning, thereby increasing time efficiency, they may sometimes lead to a slight reduction in final solution accuracy, particularly under complex scenario conditions. This trade-off between efficiency and accuracy is critical, especially when high precision is paramount. Our techniques enable more efficient and effective large-scale data analysis using parallel GAs, providing a robust foundation for future advancements and inviting further investigation into balancing these aspects.
format Article
id doaj-art-5cd225c356bf47a0b1afd822f1560cde
institution Kabale University
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-5cd225c356bf47a0b1afd822f1560cde2025-08-20T03:40:49ZengNature PortfolioScientific Reports2045-23222025-03-0115111610.1038/s41598-025-93943-0An automated parallel genetic algorithm with parametric adaptation for distributed data analysisLaila Al-Terkawi0Matteo Migliavacca1International University - Kuwait (IUK)School of Computing, University of KentAbstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhancements to boost performance further. This study proposes the integration of automatic termination and population sizing mechanisms into parallel GAs to augment their flexibility and effectiveness. We extend PDMS-BioHEL and PDMD-BioHEL, two parallel GA-based classifiers implemented on the Spark platform, and through extensive experimentation, demonstrate the efficacy of our approach in enhancing computational efficiency and user-friendliness. However, while these automated strategies significantly reduce the need for manual parameter tuning, thereby increasing time efficiency, they may sometimes lead to a slight reduction in final solution accuracy, particularly under complex scenario conditions. This trade-off between efficiency and accuracy is critical, especially when high precision is paramount. Our techniques enable more efficient and effective large-scale data analysis using parallel GAs, providing a robust foundation for future advancements and inviting further investigation into balancing these aspects.https://doi.org/10.1038/s41598-025-93943-0Genetic algorithms (GAs)GAs parameter controlClassificationLarge-scale data processingSpark
spellingShingle Laila Al-Terkawi
Matteo Migliavacca
An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
Scientific Reports
Genetic algorithms (GAs)
GAs parameter control
Classification
Large-scale data processing
Spark
title An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_full An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_fullStr An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_full_unstemmed An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_short An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_sort automated parallel genetic algorithm with parametric adaptation for distributed data analysis
topic Genetic algorithms (GAs)
GAs parameter control
Classification
Large-scale data processing
Spark
url https://doi.org/10.1038/s41598-025-93943-0
work_keys_str_mv AT lailaalterkawi anautomatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis
AT matteomigliavacca anautomatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis
AT lailaalterkawi automatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis
AT matteomigliavacca automatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis