Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase
To enhance the penetration capability of hypersonic vehicles in the gliding phase, an intelligent maneuvering penetration strategy combining imitation learning and reinforcement learning is proposed. Firstly, a reinforcement learning penetration model for hypersonic vehicles is established based on...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Aerospace |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2226-4310/12/5/438 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849711015507263488 |
|---|---|
| author | Lei Xu Yingzi Guan Jialun Pu Changzhu Wei |
| author_facet | Lei Xu Yingzi Guan Jialun Pu Changzhu Wei |
| author_sort | Lei Xu |
| collection | DOAJ |
| description | To enhance the penetration capability of hypersonic vehicles in the gliding phase, an intelligent maneuvering penetration strategy combining imitation learning and reinforcement learning is proposed. Firstly, a reinforcement learning penetration model for hypersonic vehicles is established based on the Markov Decision Process (MDP), with the design of state, action spaces, and composite reward function based on Zero-Effort Miss (ZEM). Furthermore, to overcome the difficulties in training reinforcement learning models, a truncated horizon method is employed to integrate reinforcement learning with imitation learning at the level of the optimization target. This results in the construction of a Truncated Horizon Imitation Learning Soft Actor–Critic (THIL-SAC) intelligent penetration strategy learning model, enabling a smooth transition from imitation to exploration. Finally, reward shaping and expert policies are introduced to enhance the training process. Simulation results demonstrate that the THIL-SAC strategy achieves faster convergence compared to the standard SAC method and outperforms expert strategies. Additionally, the THIL-SAC strategy meets real-time requirements for high-speed penetration scenarios, offering improved adaptability and penetration performance. |
| format | Article |
| id | doaj-art-c5b01d36c16a429dbfbe8fdcd2ae0a01 |
| institution | DOAJ |
| issn | 2226-4310 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Aerospace |
| spelling | doaj-art-c5b01d36c16a429dbfbe8fdcd2ae0a012025-08-20T03:14:43ZengMDPI AGAerospace2226-43102025-05-0112543810.3390/aerospace12050438Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding PhaseLei Xu0Yingzi Guan1Jialun Pu2Changzhu Wei3School of Astronautics, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin 150001, ChinaSchool of Astronautics, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin 150001, ChinaSchool of Astronautics, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin 150001, ChinaSchool of Astronautics, Harbin Institute of Technology, No. 92 West Dazhi Street, Harbin 150001, ChinaTo enhance the penetration capability of hypersonic vehicles in the gliding phase, an intelligent maneuvering penetration strategy combining imitation learning and reinforcement learning is proposed. Firstly, a reinforcement learning penetration model for hypersonic vehicles is established based on the Markov Decision Process (MDP), with the design of state, action spaces, and composite reward function based on Zero-Effort Miss (ZEM). Furthermore, to overcome the difficulties in training reinforcement learning models, a truncated horizon method is employed to integrate reinforcement learning with imitation learning at the level of the optimization target. This results in the construction of a Truncated Horizon Imitation Learning Soft Actor–Critic (THIL-SAC) intelligent penetration strategy learning model, enabling a smooth transition from imitation to exploration. Finally, reward shaping and expert policies are introduced to enhance the training process. Simulation results demonstrate that the THIL-SAC strategy achieves faster convergence compared to the standard SAC method and outperforms expert strategies. Additionally, the THIL-SAC strategy meets real-time requirements for high-speed penetration scenarios, offering improved adaptability and penetration performance.https://www.mdpi.com/2226-4310/12/5/438hypersonic vehiclepenetration in gliding phaseimitation learningdeep reinforcement learningtruncated horizon |
| spellingShingle | Lei Xu Yingzi Guan Jialun Pu Changzhu Wei Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase Aerospace hypersonic vehicle penetration in gliding phase imitation learning deep reinforcement learning truncated horizon |
| title | Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase |
| title_full | Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase |
| title_fullStr | Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase |
| title_full_unstemmed | Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase |
| title_short | Imitation-Reinforcement Learning Penetration Strategy for Hypersonic Vehicle in Gliding Phase |
| title_sort | imitation reinforcement learning penetration strategy for hypersonic vehicle in gliding phase |
| topic | hypersonic vehicle penetration in gliding phase imitation learning deep reinforcement learning truncated horizon |
| url | https://www.mdpi.com/2226-4310/12/5/438 |
| work_keys_str_mv | AT leixu imitationreinforcementlearningpenetrationstrategyforhypersonicvehicleinglidingphase AT yingziguan imitationreinforcementlearningpenetrationstrategyforhypersonicvehicleinglidingphase AT jialunpu imitationreinforcementlearningpenetrationstrategyforhypersonicvehicleinglidingphase AT changzhuwei imitationreinforcementlearningpenetrationstrategyforhypersonicvehicleinglidingphase |