Effective Context-Aware File Path Embeddings for Anomaly Detection
In digital forensics, especially Windows forensics, identifying anomalous file paths is crucial when dealing with large-scale data. Traditional static embedding methods, which aggregate token-level representations, discard hierarchical and sequential relationships in file paths, leading to misclassi...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Systems |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2079-8954/13/6/403 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849467255076683776 |
|---|---|
| author | Ra-Kyung Lee Hyun-Min Song Taek-Young Youn |
| author_facet | Ra-Kyung Lee Hyun-Min Song Taek-Young Youn |
| author_sort | Ra-Kyung Lee |
| collection | DOAJ |
| description | In digital forensics, especially Windows forensics, identifying anomalous file paths is crucial when dealing with large-scale data. Traditional static embedding methods, which aggregate token-level representations, discard hierarchical and sequential relationships in file paths, leading to misclassification of anomalies. This study introduces a Transformer-based sequence modeling approach to classify anomalous file paths, addressing these limitations by preserving positional and contextual relationships. File paths from the NTFS Master File Table (MFT) were embedded using FastText to capture structural and contextual dependencies. Unlike static embeddings, the proposed method processes file paths as structured sequences to enhance anomaly detection accuracy. Extensive experiments showed that Transformer models generally outperformed traditional methods in detecting structured anomalies. The Transformer model with FastText embeddings (32 dimensions) achieved an accuracy of 0.9781 and an F1-score of 0.9782, while Random Forest with FastText embeddings (64 dimensions) achieved an accuracy of 0.9729 and an F1-score of 0.9729. These findings suggest that a hybrid anomaly detection framework combining Transformer-based models with traditional techniques could enhance robustness in forensic investigations. Future research should explore combining both methods to improve adaptability across diverse forensic scenarios. |
| format | Article |
| id | doaj-art-56b7a87c90c0471c86df9d653e375bde |
| institution | Kabale University |
| issn | 2079-8954 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Systems |
| spelling | doaj-art-56b7a87c90c0471c86df9d653e375bde2025-08-20T03:27:43ZengMDPI AGSystems2079-89542025-05-0113640310.3390/systems13060403Effective Context-Aware File Path Embeddings for Anomaly DetectionRa-Kyung Lee0Hyun-Min Song1Taek-Young Youn2Department of Cyber Security, Dankook University, Jukjeon-ro 152, Yongin-si 16890, Republic of KoreaDepartment of Cyber Security, Dankook University, Jukjeon-ro 152, Yongin-si 16890, Republic of KoreaDepartment of Cyber Security, Dankook University, Jukjeon-ro 152, Yongin-si 16890, Republic of KoreaIn digital forensics, especially Windows forensics, identifying anomalous file paths is crucial when dealing with large-scale data. Traditional static embedding methods, which aggregate token-level representations, discard hierarchical and sequential relationships in file paths, leading to misclassification of anomalies. This study introduces a Transformer-based sequence modeling approach to classify anomalous file paths, addressing these limitations by preserving positional and contextual relationships. File paths from the NTFS Master File Table (MFT) were embedded using FastText to capture structural and contextual dependencies. Unlike static embeddings, the proposed method processes file paths as structured sequences to enhance anomaly detection accuracy. Extensive experiments showed that Transformer models generally outperformed traditional methods in detecting structured anomalies. The Transformer model with FastText embeddings (32 dimensions) achieved an accuracy of 0.9781 and an F1-score of 0.9782, while Random Forest with FastText embeddings (64 dimensions) achieved an accuracy of 0.9729 and an F1-score of 0.9729. These findings suggest that a hybrid anomaly detection framework combining Transformer-based models with traditional techniques could enhance robustness in forensic investigations. Future research should explore combining both methods to improve adaptability across diverse forensic scenarios.https://www.mdpi.com/2079-8954/13/6/403digital forensicsfile path analysissequence modelingword embeddingsanomaly detection |
| spellingShingle | Ra-Kyung Lee Hyun-Min Song Taek-Young Youn Effective Context-Aware File Path Embeddings for Anomaly Detection Systems digital forensics file path analysis sequence modeling word embeddings anomaly detection |
| title | Effective Context-Aware File Path Embeddings for Anomaly Detection |
| title_full | Effective Context-Aware File Path Embeddings for Anomaly Detection |
| title_fullStr | Effective Context-Aware File Path Embeddings for Anomaly Detection |
| title_full_unstemmed | Effective Context-Aware File Path Embeddings for Anomaly Detection |
| title_short | Effective Context-Aware File Path Embeddings for Anomaly Detection |
| title_sort | effective context aware file path embeddings for anomaly detection |
| topic | digital forensics file path analysis sequence modeling word embeddings anomaly detection |
| url | https://www.mdpi.com/2079-8954/13/6/403 |
| work_keys_str_mv | AT rakyunglee effectivecontextawarefilepathembeddingsforanomalydetection AT hyunminsong effectivecontextawarefilepathembeddingsforanomalydetection AT taekyoungyoun effectivecontextawarefilepathembeddingsforanomalydetection |