ARKAIV: Predicting Data Exfiltration Using Supervised Machine Learning Based on Tactics Mapping From Threat Reports and Event Logs

Data breach attacks are unique, particularly when attackers exfiltrate data from their target’s systems. As data breaches continue to increase in both frequency and severity, they pose escalating risks to organizations and society. Despite this, no prior research has focused on predicting...

Full description

Saved in:
Bibliographic Details
Main Authors: Arif Rahman Hakim, Kalamullah Ramli, Muhammad Salman, Bernardi Pranggono, Esti Rahmawati Agustina
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10818683/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data breach attacks are unique, particularly when attackers exfiltrate data from their target’s systems. As data breaches continue to increase in both frequency and severity, they pose escalating risks to organizations and society. Despite this, no prior research has focused on predicting exfiltration occurrences based on sequences of tactics identified from low-level logs. Additionally, integrating low-level logs with high-level conceptual frameworks remains a critical challenge. The urgency of automating the mapping process and developing advanced methods to assist defenders in analyzing exfiltration occurrences within their systems is evident. This paper addresses these gaps by developing a machine learning (ML) model to predict the occurrence of data exfiltration by analyzing the sequence of tactics employed by an attacker. We propose ARKAIV, which provides two main contributions: bridging the gap level between low-level logs and high-level data breach conceptual frameworks and integrating collected event logs and ML models to predict exfiltration tactics. To create our dataset, we extracted tactics from threat reports, refined the data to include ten features, and balanced using the Synthetic Minority Oversampling Technique with Edited Nearest Neighbor (SMOTE+ENN) technique. The ML model predicts exfiltration occurrences based on tactics identified from low-level logs as input. To optimize model performance, we benchmarked three resampling methods, five feature selection techniques, and five ML algorithms. Our key contributions include the creation of a novel dataset, the comprehensive techniques used to develop the ML model, and the proposed prediction method, which advances existing research. Additionally, we validate ARKAIV with case studies using event logs from real-world incidents. Our findings demonstrate that ARKAIV effectively predicts exfiltration occurrences with higher accuracy than existing approaches, providing a valuable tool for enhancing organizational cybersecurity.
ISSN:2169-3536