A Two-Phase Feature Selection Framework for Intrusion Detection System: Balancing Relevance and Computational Efficiency (2P-FSID)
The rapid growth of data demands robust security mechanisms to prevent unauthorized access, making ML-based intrusion detection systems essential. However, high-dimensional data necessitates the need for effective feature selection. This study proposes the Two-Phase Feature Selection framework for I...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-12-01
|
| Series: | Applied Artificial Intelligence |
| Online Access: | https://www.tandfonline.com/doi/10.1080/08839514.2025.2539396 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The rapid growth of data demands robust security mechanisms to prevent unauthorized access, making ML-based intrusion detection systems essential. However, high-dimensional data necessitates the need for effective feature selection. This study proposes the Two-Phase Feature Selection framework for Intrusion Detection (2P-FSID) to enhance model performance and interpretability. In Phase 1, a filter-based approach is employed to select a relevant subset of features, yielding an initial subset S1. These features are further assessed using Mutual Information (MI), Correlation (Corr), and Feature Importance (FI) as part of the Feature Relevance Estimation (FRE) process. A hybrid pruning strategy, comprising dynamic pruning and static pruning, is employed to refine the subset into S3. In Phase 2, Shapley Additive Explanations (SHAP) values are computed to quantify each feature’s influence on classification performance. Features are categorized into either positively or negatively influential. The model is initially trained using positively influential features, and then negatively influential features are iteratively added and evaluated for potential performance improvement, resulting in the final optimized subset S4. Experimental results on the NSL-KDD and UNSW-NB15 datasets demonstrate a reduction in feature space from 41 to 19 and 44 to 17 features, respectively, while achieving high detection accuracies of 95.18% and 92.79%. |
|---|---|
| ISSN: | 0883-9514 1087-6545 |