Effective Context-Aware File Path Embeddings for Anomaly Detection
In digital forensics, especially Windows forensics, identifying anomalous file paths is crucial when dealing with large-scale data. Traditional static embedding methods, which aggregate token-level representations, discard hierarchical and sequential relationships in file paths, leading to misclassi...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Systems |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2079-8954/13/6/403 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In digital forensics, especially Windows forensics, identifying anomalous file paths is crucial when dealing with large-scale data. Traditional static embedding methods, which aggregate token-level representations, discard hierarchical and sequential relationships in file paths, leading to misclassification of anomalies. This study introduces a Transformer-based sequence modeling approach to classify anomalous file paths, addressing these limitations by preserving positional and contextual relationships. File paths from the NTFS Master File Table (MFT) were embedded using FastText to capture structural and contextual dependencies. Unlike static embeddings, the proposed method processes file paths as structured sequences to enhance anomaly detection accuracy. Extensive experiments showed that Transformer models generally outperformed traditional methods in detecting structured anomalies. The Transformer model with FastText embeddings (32 dimensions) achieved an accuracy of 0.9781 and an F1-score of 0.9782, while Random Forest with FastText embeddings (64 dimensions) achieved an accuracy of 0.9729 and an F1-score of 0.9729. These findings suggest that a hybrid anomaly detection framework combining Transformer-based models with traditional techniques could enhance robustness in forensic investigations. Future research should explore combining both methods to improve adaptability across diverse forensic scenarios. |
|---|---|
| ISSN: | 2079-8954 |