Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat

This study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yang Li, Wenhan Dong, Pin Zhang, Hengang Zhai, Guangqi Li
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Drones
Subjects:	unmanned combat aerial vehicles within-visual-range air combat hierarchical reinforcement learning Wasserstein maximum-entropy learning framework temporal abstraction
Online Access:	https://www.mdpi.com/2504-446X/9/5/384
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849711276310134784
author	Yang Li Wenhan Dong Pin Zhang Hengang Zhai Guangqi Li
author_facet	Yang Li Wenhan Dong Pin Zhang Hengang Zhai Guangqi Li
author_sort	Yang Li
collection	DOAJ
description	This study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework facilitates the optimization of both autonomous low-level tactical discovery and high-level option selection. At the low level, three tactical policies (angle, snapshot, and energy tactics) are designed with reward functions informed by expert knowledge, while the high-level policy dynamically terminates current tactics and selects new ones through sparse reward learning, thus overcoming the limitations of fixed-duration tactical execution. Furthermore, a novel automatic curriculum generation mechanism based on Wasserstein Generative Adversarial Networks (WGANs) is introduced to enhance training efficiency and adaptability to diverse initial combat conditions. Extensive experiments conducted in UCAV air combat simulations have shown that MEOL not only achieves significantly better win rates than other policies when training against rule-based opponents, but also that MEOC achieves superior results in tests against tactical intra-option policies as well as other option learning policies. The framework facilitates dynamic termination and switching of tactics, thereby addressing the limitations of fixed-duration hierarchical methods. Ablation studies confirm the effectiveness of WGAN-based curricula in accelerating policy convergence. Additionally, the visual analysis of UCAVs’ flight logs validates the learned hierarchical decision-making process, showcasing the interplay between tactical selection and manoeuvring execution. This research provides novel methodologies combining hierarchical reinforcement learning with tactical domain knowledge for the autonomous decision-making of UCAVs in complex air combat scenarios.
format	Article
id	doaj-art-7ef7b3753e80456ab38ab3c6bcd4d135
institution	DOAJ
issn	2504-446X
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Drones
spelling	doaj-art-7ef7b3753e80456ab38ab3c6bcd4d1352025-08-20T03:14:39ZengMDPI AGDrones2504-446X2025-05-019538410.3390/drones9050384Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air CombatYang Li0Wenhan Dong1Pin Zhang2Hengang Zhai3Guangqi Li4Aviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaAviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaAviation Engineering School, Air Force Engineering University, Xi’an 710038, ChinaScientific Research and Academic Division, Air Force Engineering University, Xi’an 710038, ChinaThe School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, ChinaThis study proposes an unmanned combat aerial vehicle (UCAV)-oriented hierarchical reinforcement learning framework to address the temporal abstraction challenge in autonomous within-visual-range air combat (WVRAC) for UCAVs. The incorporation of maximum-entropy objectives within the MEOL framework facilitates the optimization of both autonomous low-level tactical discovery and high-level option selection. At the low level, three tactical policies (angle, snapshot, and energy tactics) are designed with reward functions informed by expert knowledge, while the high-level policy dynamically terminates current tactics and selects new ones through sparse reward learning, thus overcoming the limitations of fixed-duration tactical execution. Furthermore, a novel automatic curriculum generation mechanism based on Wasserstein Generative Adversarial Networks (WGANs) is introduced to enhance training efficiency and adaptability to diverse initial combat conditions. Extensive experiments conducted in UCAV air combat simulations have shown that MEOL not only achieves significantly better win rates than other policies when training against rule-based opponents, but also that MEOC achieves superior results in tests against tactical intra-option policies as well as other option learning policies. The framework facilitates dynamic termination and switching of tactics, thereby addressing the limitations of fixed-duration hierarchical methods. Ablation studies confirm the effectiveness of WGAN-based curricula in accelerating policy convergence. Additionally, the visual analysis of UCAVs’ flight logs validates the learned hierarchical decision-making process, showcasing the interplay between tactical selection and manoeuvring execution. This research provides novel methodologies combining hierarchical reinforcement learning with tactical domain knowledge for the autonomous decision-making of UCAVs in complex air combat scenarios.https://www.mdpi.com/2504-446X/9/5/384unmanned combat aerial vehicleswithin-visual-range air combathierarchical reinforcement learningWassersteinmaximum-entropy learning frameworktemporal abstraction
spellingShingle	Yang Li Wenhan Dong Pin Zhang Hengang Zhai Guangqi Li Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat Drones unmanned combat aerial vehicles within-visual-range air combat hierarchical reinforcement learning Wasserstein maximum-entropy learning framework temporal abstraction
title	Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_full	Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_fullStr	Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_full_unstemmed	Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_short	Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat
title_sort	hierarchical reinforcement learning with automatic curriculum generation for unmanned combat aerial vehicle tactical decision making in autonomous air combat
topic	unmanned combat aerial vehicles within-visual-range air combat hierarchical reinforcement learning Wasserstein maximum-entropy learning framework temporal abstraction
url	https://www.mdpi.com/2504-446X/9/5/384
work_keys_str_mv	AT yangli hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT wenhandong hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT pinzhang hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT hengangzhai hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat AT guangqili hierarchicalreinforcementlearningwithautomaticcurriculumgenerationforunmannedcombataerialvehicletacticaldecisionmakinginautonomousaircombat

Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat

Similar Items