Memory-augment graph transformer based unsupervised detection model for identifying performance anomalies in highly-dynamic cloud environments

Abstract Cloud computing systems provide highly available and scalable computing, storage, and network resources to meet various service demands. Anomaly detection based on monitoring metrics plays a crucial role in identifying system defects and abnormal behaviors, ensuring the reliability and stab...

Full description

Saved in:
Bibliographic Details
Main Authors: Huangyining Gao, Ruyue Xin, Peng Chen, Xi Li, Ning Lu, Peng You
Format: Article
Language:English
Published: SpringerOpen 2025-07-01
Series:Journal of Cloud Computing: Advances, Systems and Applications
Subjects:
Online Access:https://doi.org/10.1186/s13677-025-00766-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Cloud computing systems provide highly available and scalable computing, storage, and network resources to meet various service demands. Anomaly detection based on monitoring metrics plays a crucial role in identifying system defects and abnormal behaviors, ensuring the reliability and stability of cloud services. However, as the complexity of data increases and concurrent noise impacts cloud environments, server failures and abnormal events rise, making anomaly detection more challenging. To address these issues, we propose MemGT, an unsupervised multivariate time series anomaly detection method that offers high accuracy, good robustness, and noise resistance for various data patterns in complex cloud environments. Our approach utilizes a Transformer encoder and dynamic graph structure learning to extract spatio-temporal features of monitoring metrics in cloud computing systems in parallel. Additionally, we introduce a novel dynamic gated memory module to guide the Transformer encoder in extracting hidden features, thereby enhancing the model’s robustness to varying data patterns in dynamic cloud environments. To accurately distinguish between concurrent noise and real anomalies, we utilize a window-wise graph learning method, further improving the model’s noise resistance. We compared the detection performance of MemGT with 15 baseline methods across 8 public datasets. The experimental results demonstrate that our method achieves an average F1 score of 95.04%, surpassing state-of-the-art baseline methods by 24.80%.
ISSN:2192-113X