Text Mining and Unsupervised Deep Learning for Intrusion Detection in Smart-Grid Communication Networks
The Manufacturing Message Specification (MMS) protocol is frequently used to automate processes in IEC 61850-based substations and smart-grid systems. However, it may be susceptible to a variety of cyber-attacks. A frequently used protection strategy is to deploy intrusion detection systems to monit...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | IoT |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2624-831X/6/2/22 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The Manufacturing Message Specification (MMS) protocol is frequently used to automate processes in IEC 61850-based substations and smart-grid systems. However, it may be susceptible to a variety of cyber-attacks. A frequently used protection strategy is to deploy intrusion detection systems to monitor network traffic for anomalies. Conventional approaches to detecting anomalies require a large number of labeled samples and are therefore incompatible with high-dimensional time series data. This work proposes an anomaly detection method for high-dimensional sequences based on a bidirectional LSTM autoencoder. Additionally, a text-mining strategy based on a TF-IDF vectorizer and truncated SVD is presented for data preparation and feature extraction. The proposed data representation approach outperformed word embeddings (Doc2Vec) by better preserving critical domain-specific keywords in MMS traffic while reducing the complexity of model training. Unlike embeddings, which attempt to capture semantic relationships that may not exist in structured network protocols, TF-IDF focuses on token frequency and importance, making it more suitable for anomaly detection in MMS communications. To address the limitations of existing approaches that rely on labeled samples, the proposed model learns the properties and patterns of a large number of normal samples in an unsupervised manner. The results demonstrate that the proposed approach can learn potential features from high-dimensional time series data while maintaining a high True Positive Rate. |
|---|---|
| ISSN: | 2624-831X |