A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skew...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10892133/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849722714647953408 |
|---|---|
| author | Shahid Tufail Hasan Iqbal Mohd Tariq Arif I. Sarwat |
| author_facet | Shahid Tufail Hasan Iqbal Mohd Tariq Arif I. Sarwat |
| author_sort | Shahid Tufail |
| collection | DOAJ |
| description | Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction. |
| format | Article |
| id | doaj-art-ae0ba93fe9954d4ead939ac8b5b9f52f |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-ae0ba93fe9954d4ead939ac8b5b9f52f2025-08-20T03:11:15ZengIEEEIEEE Access2169-35362025-01-0113337833379810.1109/ACCESS.2025.354375110892133A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked AutoencodersShahid Tufail0https://orcid.org/0000-0001-6469-015XHasan Iqbal1https://orcid.org/0000-0001-5930-025XMohd Tariq2https://orcid.org/0000-0002-5162-7626Arif I. Sarwat3https://orcid.org/0000-0003-1179-438XDepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USACyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction.https://ieeexplore.ieee.org/document/10892133/Photovoltaic (PV) systemsgrid-connected PV systemsmachine learning algorithmsrandom forestautoencodersmulti-layer perceptron (MLP) |
| spellingShingle | Shahid Tufail Hasan Iqbal Mohd Tariq Arif I. Sarwat A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders IEEE Access Photovoltaic (PV) systems grid-connected PV systems machine learning algorithms random forest autoencoders multi-layer perceptron (MLP) |
| title | A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders |
| title_full | A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders |
| title_fullStr | A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders |
| title_full_unstemmed | A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders |
| title_short | A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders |
| title_sort | hybrid machine learning based framework for data injection attack detection in smart grids using pca and stacked autoencoders |
| topic | Photovoltaic (PV) systems grid-connected PV systems machine learning algorithms random forest autoencoders multi-layer perceptron (MLP) |
| url | https://ieeexplore.ieee.org/document/10892133/ |
| work_keys_str_mv | AT shahidtufail ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT hasaniqbal ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT mohdtariq ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT arifisarwat ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT shahidtufail hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT hasaniqbal hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT mohdtariq hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT arifisarwat hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders |