FMCNN: Raw-Data Type Identification Using Feature Matrix and CNN

This paper introduces FMCNN, a novel classification method that combines a two-dimensional feature matrix capturing raw data characteristics with a CNN-based classifier. Raw data are the most fundamental form of digital information, and accurate identification of data type and bit lengths per sample...

Full description

Saved in:
Bibliographic Details
Main Authors: Eunsu Lee, Hoon Yoo
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11122528/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper introduces FMCNN, a novel classification method that combines a two-dimensional feature matrix capturing raw data characteristics with a CNN-based classifier. Raw data are the most fundamental form of digital information, and accurate identification of data type and bit lengths per sample is essential for meaningful use and automated processing in the absence of metadata. Existing file type identification techniques remain inadequate for effectively handling such data. Before feeding raw data into the CNN model, FMCNN extracts feature matrices from the raw data to enable classification regardless of content or format. The method was evaluated using a dataset of twenty-one data types and comparisons with several state-of-the-art classification models. It achieved an average accuracy of 89.65 percent and a weighted average <inline-formula> <tex-math notation="LaTeX">$F1$ </tex-math></inline-formula> score of 0.8938, outperforming all comparison models despite using between five and two hundred times fewer parameters. These results demonstrate the effectiveness of the proposed method in achieving simplicity and high classification accuracy.
ISSN:2169-3536