Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning

Abstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yingjun Liu, Fuchun Liu, Renwei Huang
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-06-01
Series:	Scientific Reports
Subjects:	Optimal control Imitation learning Deep reinforcement learning
Online Access:	https://doi.org/10.1038/s41598-025-04417-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850138101366652928
author	Yingjun Liu Fuchun Liu Renwei Huang
author_facet	Yingjun Liu Fuchun Liu Renwei Huang
author_sort	Yingjun Liu
collection	DOAJ
description	Abstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.
format	Article
id	doaj-art-cdca7a657fcf4e74af0ca82e0080dbbb
institution	OA Journals
issn	2045-2322
language	English
publishDate	2025-06-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-cdca7a657fcf4e74af0ca82e0080dbbb2025-08-20T02:30:39ZengNature PortfolioScientific Reports2045-23222025-06-0115111310.1038/s41598-025-04417-2Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learningYingjun Liu0Fuchun Liu1Renwei Huang2School of Computer Science and Technology, Guangdong University of TechnologySchool of Computer Science and Technology, Guangdong University of TechnologyCollege of Electronic and Information Engineering, Guangzhou City PolytechnicAbstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.https://doi.org/10.1038/s41598-025-04417-2Optimal controlImitation learningDeep reinforcement learning
spellingShingle	Yingjun Liu Fuchun Liu Renwei Huang Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning Scientific Reports Optimal control Imitation learning Deep reinforcement learning
title	Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_full	Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_fullStr	Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_full_unstemmed	Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_short	Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
title_sort	supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
topic	Optimal control Imitation learning Deep reinforcement learning
url	https://doi.org/10.1038/s41598-025-04417-2
work_keys_str_mv	AT yingjunliu supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning AT fuchunliu supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning AT renweihuang supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning

Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning

Similar Items