An automated parallel genetic algorithm with parametric adaptation for distributed data analysis

Abstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhanc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Laila Al-Terkawi, Matteo Migliavacca
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-03-01
Series:	Scientific Reports
Subjects:	Genetic algorithms (GAs) GAs parameter control Classification Large-scale data processing Spark
Online Access:	https://doi.org/10.1038/s41598-025-93943-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849392201319055360
author	Laila Al-Terkawi Matteo Migliavacca
author_facet	Laila Al-Terkawi Matteo Migliavacca
author_sort	Laila Al-Terkawi
collection	DOAJ
description	Abstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhancements to boost performance further. This study proposes the integration of automatic termination and population sizing mechanisms into parallel GAs to augment their flexibility and effectiveness. We extend PDMS-BioHEL and PDMD-BioHEL, two parallel GA-based classifiers implemented on the Spark platform, and through extensive experimentation, demonstrate the efficacy of our approach in enhancing computational efficiency and user-friendliness. However, while these automated strategies significantly reduce the need for manual parameter tuning, thereby increasing time efficiency, they may sometimes lead to a slight reduction in final solution accuracy, particularly under complex scenario conditions. This trade-off between efficiency and accuracy is critical, especially when high precision is paramount. Our techniques enable more efficient and effective large-scale data analysis using parallel GAs, providing a robust foundation for future advancements and inviting further investigation into balancing these aspects.
format	Article
id	doaj-art-5cd225c356bf47a0b1afd822f1560cde
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-03-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-5cd225c356bf47a0b1afd822f1560cde2025-08-20T03:40:49ZengNature PortfolioScientific Reports2045-23222025-03-0115111610.1038/s41598-025-93943-0An automated parallel genetic algorithm with parametric adaptation for distributed data analysisLaila Al-Terkawi0Matteo Migliavacca1International University - Kuwait (IUK)School of Computing, University of KentAbstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhancements to boost performance further. This study proposes the integration of automatic termination and population sizing mechanisms into parallel GAs to augment their flexibility and effectiveness. We extend PDMS-BioHEL and PDMD-BioHEL, two parallel GA-based classifiers implemented on the Spark platform, and through extensive experimentation, demonstrate the efficacy of our approach in enhancing computational efficiency and user-friendliness. However, while these automated strategies significantly reduce the need for manual parameter tuning, thereby increasing time efficiency, they may sometimes lead to a slight reduction in final solution accuracy, particularly under complex scenario conditions. This trade-off between efficiency and accuracy is critical, especially when high precision is paramount. Our techniques enable more efficient and effective large-scale data analysis using parallel GAs, providing a robust foundation for future advancements and inviting further investigation into balancing these aspects.https://doi.org/10.1038/s41598-025-93943-0Genetic algorithms (GAs)GAs parameter controlClassificationLarge-scale data processingSpark
spellingShingle	Laila Al-Terkawi Matteo Migliavacca An automated parallel genetic algorithm with parametric adaptation for distributed data analysis Scientific Reports Genetic algorithms (GAs) GAs parameter control Classification Large-scale data processing Spark
title	An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_full	An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_fullStr	An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_full_unstemmed	An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_short	An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
title_sort	automated parallel genetic algorithm with parametric adaptation for distributed data analysis
topic	Genetic algorithms (GAs) GAs parameter control Classification Large-scale data processing Spark
url	https://doi.org/10.1038/s41598-025-93943-0
work_keys_str_mv	AT lailaalterkawi anautomatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis AT matteomigliavacca anautomatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis AT lailaalterkawi automatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis AT matteomigliavacca automatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis

An automated parallel genetic algorithm with parametric adaptation for distributed data analysis

Similar Items