Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
This study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Drones |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-446X/9/5/384 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849711276310134784 |
|---|---|
| author | Yang Li Wenhan Dong Pin Zhang Hengang Zhai Guangqi Li |
| author_facet | Yang Li Wenhan Dong Pin Zhang Hengang Zhai Guangqi Li |
| author_sort | Yang Li |
| collection | DOAJ |
| description | This study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework facilitates the optimization of both autonomous low-level tactical discovery and high-level option selection. At the low level, three tactical policies (angle, snapshot, and energy tactics) are designed with reward functions informed by expert knowledge, while the high-level policy dynamically terminates current tactics and selects new ones through sparse reward learning, thus overcoming the limitations of fixed-duration tactical execution. Furthermore, a novel automatic curriculum generation mechanism based on Wasserstein Generative Adversarial Networks (WGANs) is introduced to enhance training efficiency and adaptability to diverse initial combat conditions. Extensive experiments conducted in UCAV air combat simulations have shown that MEOL not only achieves significantly better win rates than other policies when training against rule-based opponents, but also that MEOC achieves superior results in tests against tactical intra-option policies as well as other option learning policies. The framework facilitates dynamic termination and switching of tactics, thereby addressing the limitations of fixed-duration hierarchical methods. Ablation studies confirm the effectiveness of WGAN-based curricula in accelerating policy convergence. Additionally, the visual analysis of UCAVs’ flight logs validates the learned hierarchical decision-making process, showcasing the interplay between tactical selection and manoeuvring execution. This research provides novel methodologies combining hierarchical reinforcement learning with tactical domain knowledge for the autonomous decision-making of UCAVs in complex air combat scenarios. |
| format | Article |
| id | doaj-art-7ef7b3753e80456ab38ab3c6bcd4d135 |
| institution | DOAJ |
| issn | 2504-446X |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Drones |
| spelling | doaj-art-7ef7b3753e80456ab38ab3c6bcd4d1352025-08-20T03:14:39ZengMDPI AGDrones2504-446X2025-05-019538410.3390/drones9050384Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air CombatYang Li0Wenhan Dong1Pin Zhang2Hengang Zhai3Guangqi Li4Aviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaAviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaAviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaScientific Research and Academic Division, Air Force Engineering University, Xi’an 710038, ChinaThe School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, ChinaThis study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework facilitates the optimization of both autonomous low-level tactical discovery and high-level option selection. At the low level, three tactical policies (angle, snapshot, and energy tactics) are designed with reward functions informed by expert knowledge, while the high-level policy dynamically terminates current tactics and selects new ones through sparse reward learning, thus overcoming the limitations of fixed-duration tactical execution. Furthermore, a novel automatic curriculum generation mechanism based on Wasserstein Generative Adversarial Networks (WGANs) is introduced to enhance training efficiency and adaptability to diverse initial combat conditions. Extensive experiments conducted in UCAV air combat simulations have shown that MEOL not only achieves significantly better win rates than other policies when training against rule-based opponents, but also that MEOC achieves superior results in tests against tactical intra-option policies as well as other option learning policies. The framework facilitates dynamic termination and switching of tactics, thereby addressing the limitations of fixed-duration hierarchical methods. Ablation studies confirm the effectiveness of WGAN-based curricula in accelerating policy convergence. Additionally, the visual analysis of UCAVs’ flight logs validates the learned hierarchical decision-making process, showcasing the interplay between tactical selection and manoeuvring execution. This research provides novel methodologies combining hierarchical reinforcement learning with tactical domain knowledge for the autonomous decision-making of UCAVs in complex air combat scenarios.https://www.mdpi.com/2504-446X/9/5/384unmanned combat aerial vehicleswithin-visual-range air combathierarchical reinforcement learningWassersteinmaximum-entropy learning frameworktemporal abstraction |
| spellingShingle | Yang Li Wenhan Dong Pin Zhang Hengang Zhai Guangqi Li Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat Drones unmanned combat aerial vehicles within-visual-range air combat hierarchical reinforcement learning Wasserstein maximum-entropy learning framework temporal abstraction |
| title | Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat |
| title_full | Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat |
| title_fullStr | Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat |
| title_full_unstemmed | Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat |
| title_short | Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat |
| title_sort | hierarchical reinforcement learning with automatic curriculum generation for unmanned combat aerial vehicle tactical decision making in autonomous air combat |
| topic | unmanned combat aerial vehicles within-visual-range air combat hierarchical reinforcement learning Wasserstein maximum-entropy learning framework temporal abstraction |
| url | https://www.mdpi.com/2504-446X/9/5/384 |
| work_keys_str_mv | AT yangli hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT wenhandong hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT pinzhang hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT hengangzhai hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT guangqili hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat |