A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders

Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skew...

Full description

Saved in:
Bibliographic Details
Main Authors: Shahid Tufail, Hasan Iqbal, Mohd Tariq, Arif I. Sarwat
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10892133/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722714647953408
author Shahid Tufail
Hasan Iqbal
Mohd Tariq
Arif I. Sarwat
author_facet Shahid Tufail
Hasan Iqbal
Mohd Tariq
Arif I. Sarwat
author_sort Shahid Tufail
collection DOAJ
description Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction.
format Article
id doaj-art-ae0ba93fe9954d4ead939ac8b5b9f52f
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-ae0ba93fe9954d4ead939ac8b5b9f52f2025-08-20T03:11:15ZengIEEEIEEE Access2169-35362025-01-0113337833379810.1109/ACCESS.2025.354375110892133A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked AutoencodersShahid Tufail0https://orcid.org/0000-0001-6469-015XHasan Iqbal1https://orcid.org/0000-0001-5930-025XMohd Tariq2https://orcid.org/0000-0002-5162-7626Arif I. Sarwat3https://orcid.org/0000-0003-1179-438XDepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USACyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction.https://ieeexplore.ieee.org/document/10892133/Photovoltaic (PV) systemsgrid-connected PV systemsmachine learning algorithmsrandom forestautoencodersmulti-layer perceptron (MLP)
spellingShingle Shahid Tufail
Hasan Iqbal
Mohd Tariq
Arif I. Sarwat
A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
IEEE Access
Photovoltaic (PV) systems
grid-connected PV systems
machine learning algorithms
random forest
autoencoders
multi-layer perceptron (MLP)
title A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_full A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_fullStr A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_full_unstemmed A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_short A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_sort hybrid machine learning based framework for data injection attack detection in smart grids using pca and stacked autoencoders
topic Photovoltaic (PV) systems
grid-connected PV systems
machine learning algorithms
random forest
autoencoders
multi-layer perceptron (MLP)
url https://ieeexplore.ieee.org/document/10892133/
work_keys_str_mv AT shahidtufail ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT hasaniqbal ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT mohdtariq ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT arifisarwat ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT shahidtufail hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT hasaniqbal hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT mohdtariq hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders
AT arifisarwat hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders