Efficient network attack path optimization method based on prior knowledge-based PPO algorithm

Abstract With the increasing complexity of network system architectures and the advancements in artificial intelligence, automated penetration testing holds crucial significance in enhancing network security. Attack path optimization is the key to automated penetration testing. However, existing met...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiuxiang Li, Jianping Wu
Format: Article
Language:English
Published: SpringerOpen 2025-03-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-024-00288-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract With the increasing complexity of network system architectures and the advancements in artificial intelligence, automated penetration testing holds crucial significance in enhancing network security. Attack path optimization is the key to automated penetration testing. However, existing methods mostly focus on maximizing the attack gain, overlooking the impact of the attack path length. To address this issue, we propose a bi-objective attack path optimization problem based on attack graphs, aiming to maximize attack gains while minimizing the attack path length. We model the problem as a Markov decision process and theoretically analyze the selection of the discount factor to ensure the consistency between the goal of the MDP and the optimization objective. To address the issues of excessive invalid actions and poor training effect of current deep reinforcement learning-based attack path optimization methods, we propose a Prior Knowledge-based Proximal Policy Optimization (PKPPO) algorithm. The algorithm first designs action filtering rules based on prior knowledge extracted from the attack graph. Based on these rules, action mask vectors are generated before each action sampling phase to modify the distribution of the output of the policy network. Through this approach, the action space is effectively pruned, thereby enhancing the performance and efficiency of the algorithm in large-scale, complex network scenarios. The experimental results show that the PKPPO algorithm can converge within 100 training episodes in all test scenarios. The convergence speed and stability of PKPPO are significantly better than the baselines, and it remains effective in large-scale network environments with greater adaptability.
ISSN:2523-3246