An automated parallel genetic algorithm with parametric adaptation for distributed data analysis
Abstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhanc...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-93943-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849392201319055360 |
|---|---|
| author | Laila Al-Terkawi Matteo Migliavacca |
| author_facet | Laila Al-Terkawi Matteo Migliavacca |
| author_sort | Laila Al-Terkawi |
| collection | DOAJ |
| description | Abstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhancements to boost performance further. This study proposes the integration of automatic termination and population sizing mechanisms into parallel GAs to augment their flexibility and effectiveness. We extend PDMS-BioHEL and PDMD-BioHEL, two parallel GA-based classifiers implemented on the Spark platform, and through extensive experimentation, demonstrate the efficacy of our approach in enhancing computational efficiency and user-friendliness. However, while these automated strategies significantly reduce the need for manual parameter tuning, thereby increasing time efficiency, they may sometimes lead to a slight reduction in final solution accuracy, particularly under complex scenario conditions. This trade-off between efficiency and accuracy is critical, especially when high precision is paramount. Our techniques enable more efficient and effective large-scale data analysis using parallel GAs, providing a robust foundation for future advancements and inviting further investigation into balancing these aspects. |
| format | Article |
| id | doaj-art-5cd225c356bf47a0b1afd822f1560cde |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-5cd225c356bf47a0b1afd822f1560cde2025-08-20T03:40:49ZengNature PortfolioScientific Reports2045-23222025-03-0115111610.1038/s41598-025-93943-0An automated parallel genetic algorithm with parametric adaptation for distributed data analysisLaila Al-Terkawi0Matteo Migliavacca1International University - Kuwait (IUK)School of Computing, University of KentAbstract Unleashing the potential of large-scale data analysis requires advanced computational methods capable of managing the immense size and complexity of distributed data. Genetic algorithms (GAs), known for their adaptability, benefit significantly from parallelization, prompting ongoing enhancements to boost performance further. This study proposes the integration of automatic termination and population sizing mechanisms into parallel GAs to augment their flexibility and effectiveness. We extend PDMS-BioHEL and PDMD-BioHEL, two parallel GA-based classifiers implemented on the Spark platform, and through extensive experimentation, demonstrate the efficacy of our approach in enhancing computational efficiency and user-friendliness. However, while these automated strategies significantly reduce the need for manual parameter tuning, thereby increasing time efficiency, they may sometimes lead to a slight reduction in final solution accuracy, particularly under complex scenario conditions. This trade-off between efficiency and accuracy is critical, especially when high precision is paramount. Our techniques enable more efficient and effective large-scale data analysis using parallel GAs, providing a robust foundation for future advancements and inviting further investigation into balancing these aspects.https://doi.org/10.1038/s41598-025-93943-0Genetic algorithms (GAs)GAs parameter controlClassificationLarge-scale data processingSpark |
| spellingShingle | Laila Al-Terkawi Matteo Migliavacca An automated parallel genetic algorithm with parametric adaptation for distributed data analysis Scientific Reports Genetic algorithms (GAs) GAs parameter control Classification Large-scale data processing Spark |
| title | An automated parallel genetic algorithm with parametric adaptation for distributed data analysis |
| title_full | An automated parallel genetic algorithm with parametric adaptation for distributed data analysis |
| title_fullStr | An automated parallel genetic algorithm with parametric adaptation for distributed data analysis |
| title_full_unstemmed | An automated parallel genetic algorithm with parametric adaptation for distributed data analysis |
| title_short | An automated parallel genetic algorithm with parametric adaptation for distributed data analysis |
| title_sort | automated parallel genetic algorithm with parametric adaptation for distributed data analysis |
| topic | Genetic algorithms (GAs) GAs parameter control Classification Large-scale data processing Spark |
| url | https://doi.org/10.1038/s41598-025-93943-0 |
| work_keys_str_mv | AT lailaalterkawi anautomatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis AT matteomigliavacca anautomatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis AT lailaalterkawi automatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis AT matteomigliavacca automatedparallelgeneticalgorithmwithparametricadaptationfordistributeddataanalysis |