Exploring the possibilities of MADDPG for UAV swarm control by simulating in Pac-Man environment
This paper explores the application of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) for model training to control UAV swarms in dynamic and adversarial scenarios. Using a modified Pac-Man environment, Pac-Man represents a target UAV, and Ghosts represents the UAV swarm that counteract...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
National Aerospace University «Kharkiv Aviation Institute»
2025-02-01
|
| Series: | Радіоелектронні і комп'ютерні системи |
| Subjects: | |
| Online Access: | http://nti.khai.edu/ojs/index.php/reks/article/view/2789 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper explores the application of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) for model training to control UAV swarms in dynamic and adversarial scenarios. Using a modified Pac-Man environment, Pac-Man represents a target UAV, and Ghosts represents the UAV swarm that counteracts it. The grid-based representation of Pac-Man mazes is used as an abstraction of a two-dimensional terrain model, which serves as a plane of pathways with obstacles that correspond to the UAV flight conditions at a certain altitude. The proposed approach provides a clear discretization of space, simplifying pathfinding, collision avoidance, and the planning of reconnaissance or interception routes by combining decentralized local autonomy with centralized training, which enables UAVs to coordinate effectively and quickly adapt to changing conditions. This study evaluates the performance of MADDPG-trained model-controlled adversaries against heuristic navigation strategies, such as A* and Breadth-First Search (BFS). Traditional Rule-Based Pursuit and Prediction Algorithms inspired by the behaviors of Blinky and Pinky ghosts from the classic Pac-Man game are included as benchmarks to assess the impact of learning-based methods. The purpose of this study was to determine the effectiveness of MADDPG-trained models in enhancing UAV swarm control by analyzing its adaptability and coordination capabilities in adversarial environments by computer modeling in simplified missions-like 2D environments. Experiments conducted across varying levels of terrain complexity revealed that MADDPG-trained model demonstrated superior adaptability and strategic coordination compared to the rule-based methods. Ghosts controlled by a model trained via MADDPG significantly reduce the success rate of Pac-Man agents, particularly in highly constrained environments, emphasizing the potential of learning-based adversarial strategies in UAV applications such as urban navigation, defense, and surveillance. Conclusions. MADDPG is a promising robust framework for training models to control UAV swarms, particularly in adversarial settings. This study highlights its adaptability and ability to outperform traditional rule-based methods in dynamic and complex environments. Future research should focus on comparing the effectiveness of MADDPG-trained models with multi-agent algorithms, such as Expectimax, Alpha-Beta Pruning, and Monte Carlo Tree Search (MCTS), to further understand the advantages and limitations of learning-based approaches compared with traditional decision-making methods in collaborative and adversarial UAV operations. Additionally, the exploration of 3D implementations, integrating maze height decomposition and flight restrictions, as well as incorporating cybersecurity considerations and real-world threats like anti-drone systems and electronic warfare, will enhance the robustness and applicability of these methods in realistic UAV scenarios. |
|---|---|
| ISSN: | 1814-4225 2663-2012 |