ARKAIV: Predicting Data Exfiltration Using Supervised Machine Learning Based on Tactics Mapping From Threat Reports and Event Logs
Data breach attacks are unique, particularly when attackers exfiltrate data from their target’s systems. As data breaches continue to increase in both frequency and severity, they pose escalating risks to organizations and society. Despite this, no prior research has focused on predicting...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10818683/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Data breach attacks are unique, particularly when attackers exfiltrate data from their target’s systems. As data breaches continue to increase in both frequency and severity, they pose escalating risks to organizations and society. Despite this, no prior research has focused on predicting exfiltration occurrences based on sequences of tactics identified from low-level logs. Additionally, integrating low-level logs with high-level conceptual frameworks remains a critical challenge. The urgency of automating the mapping process and developing advanced methods to assist defenders in analyzing exfiltration occurrences within their systems is evident. This paper addresses these gaps by developing a machine learning (ML) model to predict the occurrence of data exfiltration by analyzing the sequence of tactics employed by an attacker. We propose ARKAIV, which provides two main contributions: bridging the gap level between low-level logs and high-level data breach conceptual frameworks and integrating collected event logs and ML models to predict exfiltration tactics. To create our dataset, we extracted tactics from threat reports, refined the data to include ten features, and balanced using the Synthetic Minority Oversampling Technique with Edited Nearest Neighbor (SMOTE+ENN) technique. The ML model predicts exfiltration occurrences based on tactics identified from low-level logs as input. To optimize model performance, we benchmarked three resampling methods, five feature selection techniques, and five ML algorithms. Our key contributions include the creation of a novel dataset, the comprehensive techniques used to develop the ML model, and the proposed prediction method, which advances existing research. Additionally, we validate ARKAIV with case studies using event logs from real-world incidents. Our findings demonstrate that ARKAIV effectively predicts exfiltration occurrences with higher accuracy than existing approaches, providing a valuable tool for enhancing organizational cybersecurity. |
|---|---|
| ISSN: | 2169-3536 |