Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS

Deep reinforcement learning is currently a commonly used method in decision-making for cyber physical system (CPS). However, when facing an unknown environment and dealing with complex tasks, deep reinforcement learning based on black boxes cannot guarantee the security of the system and the interpr...

Full description

Saved in:
Bibliographic Details
Main Author: YIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2025-02-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2312010.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep reinforcement learning is currently a commonly used method in decision-making for cyber physical system (CPS). However, when facing an unknown environment and dealing with complex tasks, deep reinforcement learning based on black boxes cannot guarantee the security of the system and the interpretability of reward function settings. To address the above issues, a formalized spatio-temporal rule verification-guided safe reinforcement learning method is proposed. Firstly, the combination-space rule timed communicating sequential process (CSR-TCSP) is proposed to model the system. Then it is validated by failure divergence refinement (FDR) which is a model checker combined with the spatio-temporal specification language (STSL). Secondly, the structure of the reward state machine is formalized by abstracting the system environment model to propose the spatio-temporal rule reward machine (STR-RM) which can guide the setting of reward functions in reinforcement learning. In addition, to monitor system operation and ensure the safety of output decisions, a monitor and a safe action decision-making algorithm are designed to obtain a more secure state-action strategy. Finally, the effectiveness of the proposed method is demonstrated through an example of obstacle avoidance and lane-changing overtaking in the autonomous driving system.
ISSN:1673-9418