Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS

Deep reinforcement learning is currently a commonly used method in decision-making for cyber physical system (CPS). However, when facing an unknown environment and dealing with complex tasks, deep reinforcement learning based on black boxes cannot guarantee the security of the system and the interpr...

Full description

Saved in:
Bibliographic Details
Main Author: YIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2025-02-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2312010.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850152196941807616
author YIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng
author_facet YIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng
author_sort YIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng
collection DOAJ
description Deep reinforcement learning is currently a commonly used method in decision-making for cyber physical system (CPS). However, when facing an unknown environment and dealing with complex tasks, deep reinforcement learning based on black boxes cannot guarantee the security of the system and the interpretability of reward function settings. To address the above issues, a formalized spatio-temporal rule verification-guided safe reinforcement learning method is proposed. Firstly, the combination-space rule timed communicating sequential process (CSR-TCSP) is proposed to model the system. Then it is validated by failure divergence refinement (FDR) which is a model checker combined with the spatio-temporal specification language (STSL). Secondly, the structure of the reward state machine is formalized by abstracting the system environment model to propose the spatio-temporal rule reward machine (STR-RM) which can guide the setting of reward functions in reinforcement learning. In addition, to monitor system operation and ensure the safety of output decisions, a monitor and a safe action decision-making algorithm are designed to obtain a more secure state-action strategy. Finally, the effectiveness of the proposed method is demonstrated through an example of obstacle avoidance and lane-changing overtaking in the autonomous driving system.
format Article
id doaj-art-196b0f92dedc42cda4f88b3abf3f00aa
institution OA Journals
issn 1673-9418
language zho
publishDate 2025-02-01
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format Article
series Jisuanji kexue yu tansuo
spelling doaj-art-196b0f92dedc42cda4f88b3abf3f00aa2025-08-20T02:26:03ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182025-02-0119251352710.3778/j.issn.1673-9418.2312010Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPSYIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng01. College of Computer Science and Technology, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China 2. School of Information Engineering, Xuzhou Institute of Technology, Xuzhou, Jiangsu 221018, China 3. Department of Computer Science and Technology, Nanjing University, Nanjing 210023, ChinaDeep reinforcement learning is currently a commonly used method in decision-making for cyber physical system (CPS). However, when facing an unknown environment and dealing with complex tasks, deep reinforcement learning based on black boxes cannot guarantee the security of the system and the interpretability of reward function settings. To address the above issues, a formalized spatio-temporal rule verification-guided safe reinforcement learning method is proposed. Firstly, the combination-space rule timed communicating sequential process (CSR-TCSP) is proposed to model the system. Then it is validated by failure divergence refinement (FDR) which is a model checker combined with the spatio-temporal specification language (STSL). Secondly, the structure of the reward state machine is formalized by abstracting the system environment model to propose the spatio-temporal rule reward machine (STR-RM) which can guide the setting of reward functions in reinforcement learning. In addition, to monitor system operation and ensure the safety of output decisions, a monitor and a safe action decision-making algorithm are designed to obtain a more secure state-action strategy. Finally, the effectiveness of the proposed method is demonstrated through an example of obstacle avoidance and lane-changing overtaking in the autonomous driving system.http://fcst.ceaj.org/fileup/1673-9418/PDF/2312010.pdfcyber physical system; formal method; process algebra; safe reinforcement learning; autonomous driving
spellingShingle YIN Chan, ZHU Yi, WANG Jinyong, CHEN Xiaoying, HAO Guosheng
Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS
Jisuanji kexue yu tansuo
cyber physical system; formal method; process algebra; safe reinforcement learning; autonomous driving
title Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS
title_full Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS
title_fullStr Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS
title_full_unstemmed Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS
title_short Formal Verification of Spatio-Temporal Rules Guided Safe Reinforcement Learning for CPS
title_sort formal verification of spatio temporal rules guided safe reinforcement learning for cps
topic cyber physical system; formal method; process algebra; safe reinforcement learning; autonomous driving
url http://fcst.ceaj.org/fileup/1673-9418/PDF/2312010.pdf
work_keys_str_mv AT yinchanzhuyiwangjinyongchenxiaoyinghaoguosheng formalverificationofspatiotemporalrulesguidedsafereinforcementlearningforcps