Exploration-Driven Genetic Algorithms for Hyperparameter Optimisation in Deep Reinforcement Learning

This paper investigates the application of genetic algorithms (GAs) for hyperparameter optimisation in deep reinforcement learning (RL), focusing on the Deep Q-Learning (DQN) algorithm. This study aims to identify approaches that enhance RL model performance through the effective exploration of the...

Full description

Saved in:
Bibliographic Details
Main Authors: Bartłomiej Brzęk, Barbara Probierz, Jan Kozak
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/4/2067
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper investigates the application of genetic algorithms (GAs) for hyperparameter optimisation in deep reinforcement learning (RL), focusing on the Deep Q-Learning (DQN) algorithm. This study aims to identify approaches that enhance RL model performance through the effective exploration of the configuration space. By comparing different GA methods for selection, crossover, and mutation, this study focuses on deep RL models. The results indicate that GA techniques emphasising the exploration of the configuration space yield significant improvements in optimisation efficiency, reducing training time and enhancing convergence. The most effective GA improved the fitness function value from 68.26 (initial best chromosome) to 979.16 after 200 iterations, demonstrating the efficacy of the proposed approach. Furthermore, variations in specific hyperparameters, such as learning rate, gamma, and update frequency, were shown to substantially affect the DQN model’s learning ability. These findings suggest that exploration-driven GA strategies outperform GA approaches with limited exploration, underscoring the critical role of selection and crossover methods in enhancing DQN model efficiency and performance. Moreover, a mini case study on the CartPole environment revealed that even a 5% sensor dropout impaired the performance of a GA-optimised RL agent, while a 20% dropout almost entirely halted improvements.
ISSN:2076-3417