Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle

A newly designed tilt-wing unmanned aerial vehicle (Tilt-wing UAV) requires a unified control strategy across rotary-wing, fixed-wing, and transition modes, introducing significant challenges. Existing control strategies typically rely on accurate modeling or extensive parameter tuning, which limits...

Full description

Saved in:
Bibliographic Details
Main Authors: Shiji Jin, Wenjie Zhao
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Aerospace
Subjects:
Online Access:https://www.mdpi.com/2226-4310/12/5/435
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850254888811888640
author Shiji Jin
Wenjie Zhao
author_facet Shiji Jin
Wenjie Zhao
author_sort Shiji Jin
collection DOAJ
description A newly designed tilt-wing unmanned aerial vehicle (Tilt-wing UAV) requires a unified control strategy across rotary-wing, fixed-wing, and transition modes, introducing significant challenges. Existing control strategies typically rely on accurate modeling or extensive parameter tuning, which limits their adaptability to dynamically changing flight configurations. Although online reinforcement learning algorithms offer adaptability, they depend on real-world exploration, posing considerable safety and cost risks for safety-critical UAV applications. To address this challenge, we propose Temporal Sequence Constrained Q-learning (TSCQ), an offline RL framework that integrates an encoder–decoder with recurrent networks to capture temporal dependencies. The policy is further constrained within an offline dataset collected via hardware-in-the-loop simulation using a variational autoencoder, and a sequence-level prediction mechanism is introduced to ensure temporal consistency across action trajectories, thereby mitigating extrapolation error while preserving data fidelity. Experimental results demonstrate that TSCQ significantly outperforms gain scheduling, Model Predictive Control (MPC), and Batch-Constrained Q-learning (BCQ), reducing the RMSE of pitch angle by up to 53.3% and vertical velocity RMSE by approximately 33%. These findings underscore the potential of data-driven, safety-aware offline RL paradigms to enable robust and generalizable control strategies for tilt-wing UAVs.
format Article
id doaj-art-8ecf7924aae540a0b824a3a29be0fa33
institution OA Journals
issn 2226-4310
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Aerospace
spelling doaj-art-8ecf7924aae540a0b824a3a29be0fa332025-08-20T01:57:00ZengMDPI AGAerospace2226-43102025-05-0112543510.3390/aerospace12050435Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial VehicleShiji Jin0Wenjie Zhao1School of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, ChinaSchool of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, ChinaA newly designed tilt-wing unmanned aerial vehicle (Tilt-wing UAV) requires a unified control strategy across rotary-wing, fixed-wing, and transition modes, introducing significant challenges. Existing control strategies typically rely on accurate modeling or extensive parameter tuning, which limits their adaptability to dynamically changing flight configurations. Although online reinforcement learning algorithms offer adaptability, they depend on real-world exploration, posing considerable safety and cost risks for safety-critical UAV applications. To address this challenge, we propose Temporal Sequence Constrained Q-learning (TSCQ), an offline RL framework that integrates an encoder–decoder with recurrent networks to capture temporal dependencies. The policy is further constrained within an offline dataset collected via hardware-in-the-loop simulation using a variational autoencoder, and a sequence-level prediction mechanism is introduced to ensure temporal consistency across action trajectories, thereby mitigating extrapolation error while preserving data fidelity. Experimental results demonstrate that TSCQ significantly outperforms gain scheduling, Model Predictive Control (MPC), and Batch-Constrained Q-learning (BCQ), reducing the RMSE of pitch angle by up to 53.3% and vertical velocity RMSE by approximately 33%. These findings underscore the potential of data-driven, safety-aware offline RL paradigms to enable robust and generalizable control strategies for tilt-wing UAVs.https://www.mdpi.com/2226-4310/12/5/435tilt-wing UAVVTOL UAVmode transition controloffline reinforcement learning
spellingShingle Shiji Jin
Wenjie Zhao
Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle
Aerospace
tilt-wing UAV
VTOL UAV
mode transition control
offline reinforcement learning
title Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle
title_full Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle
title_fullStr Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle
title_full_unstemmed Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle
title_short Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle
title_sort temporal sequence offline reinforcement learning for transition control of a novel tilt wing unmanned aerial vehicle
topic tilt-wing UAV
VTOL UAV
mode transition control
offline reinforcement learning
url https://www.mdpi.com/2226-4310/12/5/435
work_keys_str_mv AT shijijin temporalsequenceofflinereinforcementlearningfortransitioncontrolofanoveltiltwingunmannedaerialvehicle
AT wenjiezhao temporalsequenceofflinereinforcementlearningfortransitioncontrolofanoveltiltwingunmannedaerialvehicle