Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning
Abstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous s...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-06-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-04417-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850138101366652928 |
|---|---|
| author | Yingjun Liu Fuchun Liu Renwei Huang |
| author_facet | Yingjun Liu Fuchun Liu Renwei Huang |
| author_sort | Yingjun Liu |
| collection | DOAJ |
| description | Abstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model. |
| format | Article |
| id | doaj-art-cdca7a657fcf4e74af0ca82e0080dbbb |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-cdca7a657fcf4e74af0ca82e0080dbbb2025-08-20T02:30:39ZengNature PortfolioScientific Reports2045-23222025-06-0115111310.1038/s41598-025-04417-2Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learningYingjun Liu0Fuchun Liu1Renwei Huang2School of Computer Science and Technology, Guangdong University of TechnologySchool of Computer Science and Technology, Guangdong University of TechnologyCollege of Electronic and Information Engineering, Guangzhou City PolytechnicAbstract Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system’s behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.https://doi.org/10.1038/s41598-025-04417-2Optimal controlImitation learningDeep reinforcement learning |
| spellingShingle | Yingjun Liu Fuchun Liu Renwei Huang Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning Scientific Reports Optimal control Imitation learning Deep reinforcement learning |
| title | Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning |
| title_full | Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning |
| title_fullStr | Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning |
| title_full_unstemmed | Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning |
| title_short | Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning |
| title_sort | supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning |
| topic | Optimal control Imitation learning Deep reinforcement learning |
| url | https://doi.org/10.1038/s41598-025-04417-2 |
| work_keys_str_mv | AT yingjunliu supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning AT fuchunliu supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning AT renweihuang supervisedoptimalcontrolincomplexcontinuoussystemswithtrajectoryimitationandreinforcementlearning |