Transformer-Based Vulnerability Detection in IoT Firmware Binaries Using Opcode Sequences

Firmware security is critical for maintaining the integrity of embedded systems. However, detecting vulnerabilities in firmware binaries is a challenging task. This is due to the absence of source code, the inherent complexity of binary structures, the diversity of hardware architecture, and the dif...

Full description

Saved in:
Bibliographic Details
Main Authors: M. Nandish, Jalesh Kumar, H. G. Mohan, M. V. Manoj Kumar
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11080410/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Firmware security is critical for maintaining the integrity of embedded systems. However, detecting vulnerabilities in firmware binaries is a challenging task. This is due to the absence of source code, the inherent complexity of binary structures, the diversity of hardware architecture, and the difficulty of extracting deep contextual representations from binaries. In the proposed approach, the Decoding-enhanced BERT with Disentangled Attention (DeBERTa), a novel transformer-based model is used to detect vulnerabilities in firmware binaries. Initially, firmware binaries are disassembled to extract opcode sequences, which are then tokenized and encoded as inputs to the proposed DeBERTa model. The model processes instruction opcode sequences and generates meaningful embeddings, which are utilized for classification tasks. The classifiers used in the proposed approach are Random Forest, Multi-Layer Perceptron, and GAN-based classifier, which operate on the DeBERTa-generated embeddings. The proposed model learns deep contextual representations of firmware code, effectively capturing intricate syntactic and semantic relationships. The evaluation is conducted on IoT firmware binaries collected from real-world IoT projects, reflecting practical and diverse vulnerability scenarios. Experimental results demonstrate that the proposed DeBERTa-based model achieves 97% accuracy, 97% recall, and 94.6% F1-score, outperforming conventional embedding techniques. The experimental findings demonstrate that the opcode sequence feature effectively and reliably detects different types of vulnerable and benign IoT samples.
ISSN:2169-3536