Improving Model Performance for Predicting Exfiltration Attacks Through Resampling Strategies

Addressing class imbalance is critical in cybersecurity applications, particularly in scenarios like exfiltration detection, where skewed datasets lead to biased predictions and poor generalization for minority classes. This study investigates five Synthetic Minority Oversampling Technique (SMOTE)...

Full description

Saved in:
Bibliographic Details
Main Authors: Arif Rahman Hakim, Kalamullah Ramli, Muhammad Salman, Esti Rahmawati Agustina
Format: Article
Language:English
Published: IIUM Press, International Islamic University Malaysia 2025-01-01
Series:International Islamic University Malaysia Engineering Journal
Subjects:
Online Access:https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3547
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Addressing class imbalance is critical in cybersecurity applications, particularly in scenarios like exfiltration detection, where skewed datasets lead to biased predictions and poor generalization for minority classes. This study investigates five Synthetic Minority Oversampling Technique (SMOTE) variants, including BorderlineSMOTE, KMeansSMOTE, SMOTEENC, SMOTEENN, and SMOTETomek, to mitigate severe imbalance in our customized tactic-labeled dataset with dominant majority class influence and weak class separability class imbalance. We use seven imbalance metrics to assess each SMOTE variant’s impact on class distribution stability and separability. Furthermore, we evaluate model performance across five classifiers: Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, and XGBoost. Findings reveal that SMOTEENN consistently enhances performance metrics (accuracy, precision, recall, F1-score, and geometric mean) on an average of 99% across most classifiers, establishing itself as the most adaptable variant for handling imbalance. This study provides a comprehensive framework for selecting resampling strategies to enhance classification efficacy in cybersecurity tasks with imbalanced data.
ISSN:1511-788X
2289-7860