Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning

Abstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous s...

Full description

Saved in:
Bibliographic Details
Main Authors: Yingjun Liu, Fuchun Liu, Renwei Huang
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-04417-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850138101366652928
author Yingjun Liu
Fuchun Liu
Renwei Huang
author_facet Yingjun Liu
Fuchun Liu
Renwei Huang
author_sort Yingjun Liu
collection DOAJ
description Abstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.
format Article
id doaj-art-cdca7a657fcf4e74af0ca82e0080dbbb
institution OA Journals
issn 2045-2322
language English
publishDate 2025-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-cdca7a657fcf4e74af0ca82e0080dbbb2025-08-20T02:30:39ZengNature PortfolioScientific Reports2045-23222025-06-0115111310.1038/s41598-025-04417-2Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learningYingjun Liu0Fuchun Liu1Renwei Huang2School of Computer Science and Technology, Guangdong University of TechnologySchool of Computer Science and Technology, Guangdong University of TechnologyCollege of Electronic and Information Engineering, Guangzhou City PolytechnicAbstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.https://doi.org/10.1038/s41598-025-04417-2Optimal controlImitation learningDeep reinforcement learning
spellingShingle Yingjun Liu
Fuchun Liu
Renwei Huang
Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
Scientific Reports
Optimal control
Imitation learning
Deep reinforcement learning
title Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_full Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_fullStr Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_full_unstemmed Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_short Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_sort supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
topic Optimal control
Imitation learning
Deep reinforcement learning
url https://doi.org/10.1038/s41598-025-04417-2
work_keys_str_mv AT yingjunliu supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning
AT fuchunliu supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning
AT renweihuang supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning