Intelligent Online Multiconstrained Reentry Guidance Based on Hindsight Experience Replay

Traditional guidance algorithms for hypersonic glide vehicles face the challenge of real-time requirements and robustness to multiple deviations or tasks. In this paper, an intelligent online multiconstrained reentry guidance is proposed to strikingly reduce computational burden and enhance the effe...

Full description

Saved in:
Bibliographic Details
Main Authors: Qingji Jiang, Xiaogang Wang, Yuliang Bai, Yu Li
Format: Article
Language:English
Published: Wiley 2023-01-01
Series:International Journal of Aerospace Engineering
Online Access:http://dx.doi.org/10.1155/2023/5883080
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional guidance algorithms for hypersonic glide vehicles face the challenge of real-time requirements and robustness to multiple deviations or tasks. In this paper, an intelligent online multiconstrained reentry guidance is proposed to strikingly reduce computational burden and enhance the effectiveness with multiple constraints. First, the simulation environment of reentry including dynamics, multiconstraints, and control variables is built. Different from traditional decoupling methods, the bank angle command including its magnitude and sign is designed as the sole guidance variable. Secondly, a policy neural network is designed to output end-to-end guidance commands. By transforming the reentry process into a Markov Decision Process (MDP), the policy network can be trained by deep reinforcement learning (DRL). To address the sparse reward issue caused by multiconstraints, the improved Hindsight Experience Replay (HER) method is adaptively combined with Deep Deterministic Policy Gradient (DDPG) algorithm by transforming multiconstraints into multigoals. As a result, the novel training algorithm can realize higher utilization of failed data and improve the rate of convergence. Finally, simulations for typical scenes show that the policy network in the proposed guidance can output effective commands in much less time than the traditional method. The guidance is robust to initial bias, different targets, and online aerodynamic deviation.
ISSN:1687-5974