Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat

This study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang Li, Wenhan Dong, Pin Zhang, Hengang Zhai, Guangqi Li
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/9/5/384
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849711276310134784
author Yang Li
Wenhan Dong
Pin Zhang
Hengang Zhai
Guangqi Li
author_facet Yang Li
Wenhan Dong
Pin Zhang
Hengang Zhai
Guangqi Li
author_sort Yang Li
collection DOAJ
description This study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework facilitates the optimization of both autonomous low-level tactical discovery and high-level option selection. At the low level, three tactical policies (angle, snapshot, and energy tactics) are designed with reward functions informed by expert knowledge, while the high-level policy dynamically terminates current tactics and selects new ones through sparse reward learning, thus overcoming the limitations of fixed-duration tactical execution. Furthermore, a novel automatic curriculum generation mechanism based on Wasserstein Generative Adversarial Networks (WGANs) is introduced to enhance training efficiency and adaptability to diverse initial combat conditions. Extensive experiments conducted in UCAV air combat simulations have shown that MEOL not only achieves significantly better win rates than other policies when training against rule-based opponents, but also that MEOC achieves superior results in tests against tactical intra-option policies as well as other option learning policies. The framework facilitates dynamic termination and switching of tactics, thereby addressing the limitations of fixed-duration hierarchical methods. Ablation studies confirm the effectiveness of WGAN-based curricula in accelerating policy convergence. Additionally, the visual analysis of UCAVs’ flight logs validates the learned hierarchical decision-making process, showcasing the interplay between tactical selection and manoeuvring execution. This research provides novel methodologies combining hierarchical reinforcement learning with tactical domain knowledge for the autonomous decision-making of UCAVs in complex air combat scenarios.
format Article
id doaj-art-7ef7b3753e80456ab38ab3c6bcd4d135
institution DOAJ
issn 2504-446X
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-7ef7b3753e80456ab38ab3c6bcd4d1352025-08-20T03:14:39ZengMDPI AGDrones2504-446X2025-05-019538410.3390/drones9050384Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air CombatYang Li0Wenhan Dong1Pin Zhang2Hengang Zhai3Guangqi Li4Aviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaAviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaAviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaScientific Research and Academic Division, Air Force Engineering University, Xi’an 710038, ChinaThe School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, ChinaThis study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework facilitates the optimization of both autonomous low-level tactical discovery and high-level option selection. At the low level, three tactical policies (angle, snapshot, and energy tactics) are designed with reward functions informed by expert knowledge, while the high-level policy dynamically terminates current tactics and selects new ones through sparse reward learning, thus overcoming the limitations of fixed-duration tactical execution. Furthermore, a novel automatic curriculum generation mechanism based on Wasserstein Generative Adversarial Networks (WGANs) is introduced to enhance training efficiency and adaptability to diverse initial combat conditions. Extensive experiments conducted in UCAV air combat simulations have shown that MEOL not only achieves significantly better win rates than other policies when training against rule-based opponents, but also that MEOC achieves superior results in tests against tactical intra-option policies as well as other option learning policies. The framework facilitates dynamic termination and switching of tactics, thereby addressing the limitations of fixed-duration hierarchical methods. Ablation studies confirm the effectiveness of WGAN-based curricula in accelerating policy convergence. Additionally, the visual analysis of UCAVs’ flight logs validates the learned hierarchical decision-making process, showcasing the interplay between tactical selection and manoeuvring execution. This research provides novel methodologies combining hierarchical reinforcement learning with tactical domain knowledge for the autonomous decision-making of UCAVs in complex air combat scenarios.https://www.mdpi.com/2504-446X/9/5/384unmanned combat aerial vehicleswithin-visual-range air combathierarchical reinforcement learningWassersteinmaximum-entropy learning frameworktemporal abstraction
spellingShingle Yang Li
Wenhan Dong
Pin Zhang
Hengang Zhai
Guangqi Li
Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
Drones
unmanned combat aerial vehicles
within-visual-range air combat
hierarchical reinforcement learning
Wasserstein
maximum-entropy learning framework
temporal abstraction
title Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_full Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_fullStr Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_full_unstemmed Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_short Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_sort hierarchical reinforcement learning with automatic curriculum generation for unmanned combat aerial vehicle tactical decision making in autonomous air combat
topic unmanned combat aerial vehicles
within-visual-range air combat
hierarchical reinforcement learning
Wasserstein
maximum-entropy learning framework
temporal abstraction
url https://www.mdpi.com/2504-446X/9/5/384
work_keys_str_mv AT yangli hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat
AT wenhandong hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat
AT pinzhang hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat
AT hengangzhai hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat
AT guangqili hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat