A Dual-Strategy Framework for Cyber Threat Detection in Imbalanced, High-Dimensional Data Across Heterogeneous Networks
As cyber threats grow in complexity, ensuring robust network security has become increasingly critical. Intrusion Detection Systems (IDS) serve as a key defense mechanism, detecting potential threats and unauthorized activities that may evade traditional firewalls. Intrusion Detection Systems (IDS)...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11048903/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | As cyber threats grow in complexity, ensuring robust network security has become increasingly critical. Intrusion Detection Systems (IDS) serve as a key defense mechanism, detecting potential threats and unauthorized activities that may evade traditional firewalls. Intrusion Detection Systems (IDS) face significant challenges in detecting cyber threats due to highly imbalanced datasets, high-dimensional feature spaces, and computational inefficiencies. Existing methods often struggle to maintain detection accuracy for minority attack classes while dealing with redundant and irrelevant features, leading to degraded model performance. This study addresses these gaps by introducing a novel IDS framework that optimizes data balancing, feature selection, and classification. First, the Variance Split Adaptive Sampling KD-SMOTE (VAST-KD-SMOTE) technique addresses data imbalance by strategically under-sampling majority class instances using a variance-based KD-Tree. This preserves meaningful data patterns while reducing computational costs. Minority class instances are simultaneously oversampled using SMOTE, where diversity is enhanced by selecting k-nearest neighbors from KD-Tree leaf nodes. Sampled data quality is validated using Jensen-Shannon Divergence, Silhouette Score, and Davies-Bouldin Index to ensure realistic synthetic sample generation. Second, the Cauchy-Gaussian Genetic-Arithmetic Optimizer (CG-GAO) addresses the challenge of high-dimensional data by combining a genetic algorithm (GA) and an arithmetic optimization algorithm (AOA), enhancing exploration and preventing premature convergence. The proposed IDS employs classifiers such as Decision Tree, Random Forest, Cat Boost, Ada Boost, XG Boost, and Bagging Classifier to improve detection performance. Experiments are conducted on the CICIDS2017, IoTID20, and ToN-IoT datasets. The proposed IDS is evaluated using metrics that specifically evaluate anomaly detection capabilities, such as Accuracy, Precision, DR (Detection Rate), Specificity, MR (Miss Rate), FAR (False Alarm Rate), F1-Score, Cohen’s Kappa, MCC, ROC-AUC. The proposed IDS outperforms conventional AOA and GA, achieving 99.72%,99.29%, and 99.97% accuracy with the Bagging classifier. Feature selection and data balancing technique improves Detection Rate and reduce complexity by a 37% reduction in computational overhead, making it a breakthrough in IDS for imbalanced, high-dimensional cybersecurity data. |
|---|---|
| ISSN: | 2169-3536 |