Energy Optimal Trajectory Planning for the Morphing Solar-Powered Unmanned Aerial Vehicle Based on Hierarchical Reinforcement Learning

Trajectory planning is crucial for solar aircraft endurance. The multi-wing morphing solar aircraft can enhance solar energy acquisition through wing deflection, which simultaneously incurs aerodynamic losses, complicating energy coupling and challenging existing planning methods in efficiency and l...

Full description

Saved in:
Bibliographic Details
Main Authors: Tichao Xu, Wenyue Meng, Jian Zhang
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/9/7/498
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Trajectory planning is crucial for solar aircraft endurance. The multi-wing morphing solar aircraft can enhance solar energy acquisition through wing deflection, which simultaneously incurs aerodynamic losses, complicating energy coupling and challenging existing planning methods in efficiency and long-term optimization. This study presents an energy-optimal trajectory planning method based on Hierarchical Reinforcement Learning for morphing solar-powered Unmanned Aerial Vehicles (UAVs), exemplified by a Λ-shaped aircraft. This method aims to train a hierarchical policy to autonomously track energy peaks. It features a top-level decision policy selecting appropriate bottom-level policies based on energy factors, which generate control commands such as thrust, attitude angles, and wing deflection angles. Shaped properly by reward functions and training conditions, the hierarchical policy can enable the UAV to adapt to changing flight conditions and achieve autonomous flight with energy maximization. Evaluated through 24 h simulation flights on the summer solstice, the results demonstrate that the hierarchical policy can appropriately switch its bottom-level policies during daytime and generate real-time control commands that satisfy optimal energy power requirements. Compared with the minimum energy consumption benchmark case, the proposed hierarchical policy achieved 0.98 h more of full-charge high-altitude cruise duration and 1.92% more remaining battery energy after 24 h, demonstrating superior energy optimization capabilities. In addition, the strong adaptability of the hierarchical policy to different quarterly dates was demonstrated through generalization ability testing.
ISSN:2504-446X