A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks

As cyber threats grow in complexity, ensuring robust network security has become increasingly critical. Intrusion Detection Systems (IDS) serve as a key defense mechanism, detecting potential threats and unauthorized activities that may evade traditional firewalls. Intrusion Detection Systems (IDS)...

Full description

Saved in:
Bibliographic Details
Main Authors: T. Saranya, S. Indra Priyadharshini
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11048903/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714404419960832
author T. Saranya
S. Indra Priyadharshini
author_facet T. Saranya
S. Indra Priyadharshini
author_sort T. Saranya
collection DOAJ
description As cyber threats grow in complexity, ensuring robust network security has become increasingly critical. Intrusion Detection Systems (IDS) serve as a key defense mechanism, detecting potential threats and unauthorized activities that may evade traditional firewalls. Intrusion Detection Systems (IDS) face significant challenges in detecting cyber threats due to highly imbalanced datasets, high-dimensional feature spaces, and computational inefficiencies. Existing methods often struggle to maintain detection accuracy for minority attack classes while dealing with redundant and irrelevant features, leading to degraded model performance. This study addresses these gaps by introducing a novel IDS framework that optimizes data balancing, feature selection, and classification. First, the Variance Split Adaptive Sampling KD-SMOTE (VAST-KD-SMOTE) technique addresses data imbalance by strategically under-sampling majority class instances using a variance-based KD-Tree. This preserves meaningful data patterns while reducing computational costs. Minority class instances are simultaneously oversampled using SMOTE, where diversity is enhanced by selecting k-nearest neighbors from KD-Tree leaf nodes. Sampled data quality is validated using Jensen-Shannon Divergence, Silhouette Score, and Davies-Bouldin Index to ensure realistic synthetic sample generation. Second, the Cauchy-Gaussian Genetic-Arithmetic Optimizer (CG-GAO) addresses the challenge of high-dimensional data by combining a genetic algorithm (GA) and an arithmetic optimization algorithm (AOA), enhancing exploration and preventing premature convergence. The proposed IDS employs classifiers such as Decision Tree, Random Forest, Cat Boost, Ada Boost, XG Boost, and Bagging Classifier to improve detection performance. Experiments are conducted on the CICIDS2017, IoTID20, and ToN-IoT datasets. The proposed IDS is evaluated using metrics that specifically evaluate anomaly detection capabilities, such as Accuracy, Precision, DR (Detection Rate), Specificity, MR (Miss Rate), FAR (False Alarm Rate), F1-Score, Cohen’s Kappa, MCC, ROC-AUC. The proposed IDS outperforms conventional AOA and GA, achieving 99.72%,99.29%, and 99.97% accuracy with the Bagging classifier. Feature selection and data balancing technique improves Detection Rate and reduce complexity by a 37% reduction in computational overhead, making it a breakthrough in IDS for imbalanced, high-dimensional cybersecurity data.
format Article
id doaj-art-f5c2881bdf9840c2b5b43d79d99f4bfb
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-f5c2881bdf9840c2b5b43d79d99f4bfb2025-08-20T03:13:43ZengIEEEIEEE Access2169-35362025-01-011312531312533110.1109/ACCESS.2025.358278811048903A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous NetworksT. Saranya0https://orcid.org/0009-0006-1202-9486S. Indra Priyadharshini1https://orcid.org/0000-0002-0891-1605School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, IndiaAs cyber threats grow in complexity, ensuring robust network security has become increasingly critical. Intrusion Detection Systems (IDS) serve as a key defense mechanism, detecting potential threats and unauthorized activities that may evade traditional firewalls. Intrusion Detection Systems (IDS) face significant challenges in detecting cyber threats due to highly imbalanced datasets, high-dimensional feature spaces, and computational inefficiencies. Existing methods often struggle to maintain detection accuracy for minority attack classes while dealing with redundant and irrelevant features, leading to degraded model performance. This study addresses these gaps by introducing a novel IDS framework that optimizes data balancing, feature selection, and classification. First, the Variance Split Adaptive Sampling KD-SMOTE (VAST-KD-SMOTE) technique addresses data imbalance by strategically under-sampling majority class instances using a variance-based KD-Tree. This preserves meaningful data patterns while reducing computational costs. Minority class instances are simultaneously oversampled using SMOTE, where diversity is enhanced by selecting k-nearest neighbors from KD-Tree leaf nodes. Sampled data quality is validated using Jensen-Shannon Divergence, Silhouette Score, and Davies-Bouldin Index to ensure realistic synthetic sample generation. Second, the Cauchy-Gaussian Genetic-Arithmetic Optimizer (CG-GAO) addresses the challenge of high-dimensional data by combining a genetic algorithm (GA) and an arithmetic optimization algorithm (AOA), enhancing exploration and preventing premature convergence. The proposed IDS employs classifiers such as Decision Tree, Random Forest, Cat Boost, Ada Boost, XG Boost, and Bagging Classifier to improve detection performance. Experiments are conducted on the CICIDS2017, IoTID20, and ToN-IoT datasets. The proposed IDS is evaluated using metrics that specifically evaluate anomaly detection capabilities, such as Accuracy, Precision, DR (Detection Rate), Specificity, MR (Miss Rate), FAR (False Alarm Rate), F1-Score, Cohen’s Kappa, MCC, ROC-AUC. The proposed IDS outperforms conventional AOA and GA, achieving 99.72%,99.29%, and 99.97% accuracy with the Bagging classifier. Feature selection and data balancing technique improves Detection Rate and reduce complexity by a 37% reduction in computational overhead, making it a breakthrough in IDS for imbalanced, high-dimensional cybersecurity data.https://ieeexplore.ieee.org/document/11048903/Intrusion detection system (IDS)advanced data balancing techniquearithmetic optimization algorithm (AOA)ensemble machine learninggenetic algorithmhybrid feature selection
spellingShingle T. Saranya
S. Indra Priyadharshini
A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
IEEE Access
Intrusion detection system (IDS)
advanced data balancing technique
arithmetic optimization algorithm (AOA)
ensemble machine learning
genetic algorithm
hybrid feature selection
title A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
title_full A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
title_fullStr A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
title_full_unstemmed A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
title_short A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
title_sort dual strategy framework for cyber threat detection in imbalanced high dimensional data across heterogeneous networks
topic Intrusion detection system (IDS)
advanced data balancing technique
arithmetic optimization algorithm (AOA)
ensemble machine learning
genetic algorithm
hybrid feature selection
url https://ieeexplore.ieee.org/document/11048903/
work_keys_str_mv AT tsaranya adualstrategyframeworkforcyberthreatdetectioninimbalancedhighdimensionaldataacrossheterogeneousnetworks
AT sindrapriyadharshini adualstrategyframeworkforcyberthreatdetectioninimbalancedhighdimensionaldataacrossheterogeneousnetworks
AT tsaranya dualstrategyframeworkforcyberthreatdetectioninimbalancedhighdimensionaldataacrossheterogeneousnetworks
AT sindrapriyadharshini dualstrategyframeworkforcyberthreatdetectioninimbalancedhighdimensionaldataacrossheterogeneousnetworks