A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders

Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skew...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shahid Tufail, Hasan Iqbal, Mohd Tariq, Arif I. Sarwat
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Photovoltaic (PV) systems grid-connected PV systems machine learning algorithms random forest autoencoders multi-layer perceptron (MLP)
Online Access:	https://ieeexplore.ieee.org/document/10892133/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849722714647953408
author	Shahid Tufail Hasan Iqbal Mohd Tariq Arif I. Sarwat
author_facet	Shahid Tufail Hasan Iqbal Mohd Tariq Arif I. Sarwat
author_sort	Shahid Tufail
collection	DOAJ
description	Cyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction.
format	Article
id	doaj-art-ae0ba93fe9954d4ead939ac8b5b9f52f
institution	DOAJ
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-ae0ba93fe9954d4ead939ac8b5b9f52f2025-08-20T03:11:15ZengIEEEIEEE Access2169-35362025-01-0113337833379810.1109/ACCESS.2025.354375110892133A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked AutoencodersShahid Tufail0https://orcid.org/0000-0001-6469-015XHasan Iqbal1https://orcid.org/0000-0001-5930-025XMohd Tariq2https://orcid.org/0000-0002-5162-7626Arif I. Sarwat3https://orcid.org/0000-0003-1179-438XDepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USADepartment of Electrical and Computer Engineering, Florida International University, Miami, FL, USACyberattacks, especially data injection attacks, are becoming more common as smart grids are increasingly interconnected. In addition, accurate and unbiased high-quality data is required for model training. Most of the data we collect from the real world is sparse, incomplete, inconsistent, and skewed. To address these issues, we have proposed a framework to detect such attacks in this study. Using a stacked autoencoder architecture, synthetic instances of minority class data were generated. The generated classes address the imbalances in the data to enhance the generalizability of the model and address diverse attack scenarios. Various machine learning algorithms were evaluated, and the Random Forest (RF) model consistently achieved superior accuracy, ranging from 99.32% to 95.89%. In particular, traditional algorithms such as Logistic Regression (LR) exhibited sensitivity to dimensionality reductions, experiencing a 16.96% accuracy drop when the principal components were reduced from all to 10. In contrast, RF demonstrated resilience, with only a 1.67% mean accuracy drop under similar conditions. Both RF and XGBoost (XGB) emerged as standout models, showcasing high accuracy and robust performance even with dimensionality reduction via principal component analysis (PCA). However, reducing PCA components from 10 to 5 led to performance decreases in all models. The Support Vector Machine (SVM) Classifier shows the highest accuracy drop of 14.21%. This study shows the importance of understanding algorithmic behavior and data features and how it can impact the performance of ML models. This analysis will strengthen cybersecurity in smart grids and focusing on the critical need for careful feature selection and tuning, particularly for models sensitive to dimensionality reduction.https://ieeexplore.ieee.org/document/10892133/Photovoltaic (PV) systemsgrid-connected PV systemsmachine learning algorithmsrandom forestautoencodersmulti-layer perceptron (MLP)
spellingShingle	Shahid Tufail Hasan Iqbal Mohd Tariq Arif I. Sarwat A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders IEEE Access Photovoltaic (PV) systems grid-connected PV systems machine learning algorithms random forest autoencoders multi-layer perceptron (MLP)
title	A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_full	A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_fullStr	A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_full_unstemmed	A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_short	A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders
title_sort	hybrid machine learning based framework for data injection attack detection in smart grids using pca and stacked autoencoders
topic	Photovoltaic (PV) systems grid-connected PV systems machine learning algorithms random forest autoencoders multi-layer perceptron (MLP)
url	https://ieeexplore.ieee.org/document/10892133/
work_keys_str_mv	AT shahidtufail ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT hasaniqbal ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT mohdtariq ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT arifisarwat ahybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT shahidtufail hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT hasaniqbal hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT mohdtariq hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders AT arifisarwat hybridmachinelearningbasedframeworkfordatainjectionattackdetectioninsmartgridsusingpcaandstackedautoencoders

A Hybrid Machine Learning-Based Framework for Data Injection Attack Detection in Smart Grids Using PCA and Stacked Autoencoders

Similar Items