FMCNN: Raw-Data Type Identification Using Feature Matrix and CNN
This paper introduces FMCNN, a novel classification method that combines a two-dimensional feature matrix capturing raw data characteristics with a CNN-based classifier. Raw data are the most fundamental form of digital information, and accurate identification of data type and bit lengths per sample...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11122528/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper introduces FMCNN, a novel classification method that combines a two-dimensional feature matrix capturing raw data characteristics with a CNN-based classifier. Raw data are the most fundamental form of digital information, and accurate identification of data type and bit lengths per sample is essential for meaningful use and automated processing in the absence of metadata. Existing file type identification techniques remain inadequate for effectively handling such data. Before feeding raw data into the CNN model, FMCNN extracts feature matrices from the raw data to enable classification regardless of content or format. The method was evaluated using a dataset of twenty-one data types and comparisons with several state-of-the-art classification models. It achieved an average accuracy of 89.65 percent and a weighted average <inline-formula> <tex-math notation="LaTeX">$F1$ </tex-math></inline-formula> score of 0.8938, outperforming all comparison models despite using between five and two hundred times fewer parameters. These results demonstrate the effectiveness of the proposed method in achieving simplicity and high classification accuracy. |
|---|---|
| ISSN: | 2169-3536 |