Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework

With the development of unmanned aircraft and artificial intelligence technology, the future of air combat is moving towards unmanned and autonomous direction. In this paper, we introduce a new layered decision framework designed to address the six-degrees-of-freedom (6-DOF) aircraft within-visual-r...

Full description

Saved in:

Bibliographic Details
Main Authors:	Junru Mei, Ge Li, Hesong Huang
Format:	Article
Language:	English
Published:	MDPI AG 2024-09-01
Series:	Mathematics
Subjects:	air combat deep reinforcement learning SAC recurrent neural network
Online Access:	https://www.mdpi.com/2227-7390/12/19/3020
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850184992615825408
author	Junru Mei Ge Li Hesong Huang
author_facet	Junru Mei Ge Li Hesong Huang
author_sort	Junru Mei
collection	DOAJ
description	With the development of unmanned aircraft and artificial intelligence technology, the future of air combat is moving towards unmanned and autonomous direction. In this paper, we introduce a new layered decision framework designed to address the six-degrees-of-freedom (6-DOF) aircraft within-visual-range (WVR) air-combat challenge. The decision-making process is divided into two layers, each of which is addressed separately using reinforcement learning (RL). The upper layer is the combat policy, which determines maneuvering instructions based on the current combat situation (such as altitude, speed, and attitude). The lower layer control policy then uses these commands to calculate the input signals from various parts of the aircraft (aileron, elevator, rudder, and throttle). Among them, the control policy is modeled as a Markov decision framework, and the combat policy is modeled as a partially observable Markov decision framework. We describe the two-layer training method in detail. For the control policy, we designed rewards based on expert knowledge to accurately and stably complete autonomous driving tasks. At the same time, for combat policy, we introduce a self-game-based course learning, allowing the agent to play against historical policies during training to improve performance. The experimental results show that the operational success rate of the proposed method against the game theory baseline reaches 85.7%. Efficiency was also outstanding, with an average 13.6% reduction in training time compared to the RL baseline.
format	Article
id	doaj-art-6d44eb78b0f549c28bac8194d2fce41b
institution	OA Journals
issn	2227-7390
language	English
publishDate	2024-09-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-6d44eb78b0f549c28bac8194d2fce41b2025-08-20T02:16:54ZengMDPI AGMathematics2227-73902024-09-011219302010.3390/math12193020Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation FrameworkJunru Mei0Ge Li1Hesong Huang2College of Systems Engineering, National University of Defense Technology, Changsha 410073, ChinaCollege of Systems Engineering, National University of Defense Technology, Changsha 410073, ChinaCollege of Systems Engineering, National University of Defense Technology, Changsha 410073, ChinaWith the development of unmanned aircraft and artificial intelligence technology, the future of air combat is moving towards unmanned and autonomous direction. In this paper, we introduce a new layered decision framework designed to address the six-degrees-of-freedom (6-DOF) aircraft within-visual-range (WVR) air-combat challenge. The decision-making process is divided into two layers, each of which is addressed separately using reinforcement learning (RL). The upper layer is the combat policy, which determines maneuvering instructions based on the current combat situation (such as altitude, speed, and attitude). The lower layer control policy then uses these commands to calculate the input signals from various parts of the aircraft (aileron, elevator, rudder, and throttle). Among them, the control policy is modeled as a Markov decision framework, and the combat policy is modeled as a partially observable Markov decision framework. We describe the two-layer training method in detail. For the control policy, we designed rewards based on expert knowledge to accurately and stably complete autonomous driving tasks. At the same time, for combat policy, we introduce a self-game-based course learning, allowing the agent to play against historical policies during training to improve performance. The experimental results show that the operational success rate of the proposed method against the game theory baseline reaches 85.7%. Efficiency was also outstanding, with an average 13.6% reduction in training time compared to the RL baseline.https://www.mdpi.com/2227-7390/12/19/3020air combatdeep reinforcement learningSACrecurrent neural network
spellingShingle	Junru Mei Ge Li Hesong Huang Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework Mathematics air combat deep reinforcement learning SAC recurrent neural network
title	Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework
title_full	Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework
title_fullStr	Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework
title_full_unstemmed	Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework
title_short	Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework
title_sort	deep reinforcement learning based air combat maneuver generation framework
topic	air combat deep reinforcement learning SAC recurrent neural network
url	https://www.mdpi.com/2227-7390/12/19/3020
work_keys_str_mv	AT junrumei deepreinforcementlearningbasedaircombatmaneuvergenerationframework AT geli deepreinforcementlearningbasedaircombatmaneuvergenerationframework AT hesonghuang deepreinforcementlearningbasedaircombatmaneuvergenerationframework

Deep Reinforcement-Learning-Based Air-Combat-Maneuver Generation Framework

Similar Items