JDroid: Android malware detection using hybrid opcode feature vector

The rapid proliferation of devices using the Android operating system makes these devices the primary target for malware developers. Researchers are investigating different techniques to protect end users from these attackers. While many of these techniques are successful in detecting malware, they...

Full description

Saved in:
Bibliographic Details
Main Author: Recep Sinan Arslan
Format: Article
Language:English
Published: PeerJ Inc. 2025-07-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-3051.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849408500388593664
author Recep Sinan Arslan
author_facet Recep Sinan Arslan
author_sort Recep Sinan Arslan
collection DOAJ
description The rapid proliferation of devices using the Android operating system makes these devices the primary target for malware developers. Researchers are investigating different techniques to protect end users from these attackers. While many of these techniques are successful in detecting malware, they also have some limitations. Because many applications today use advanced obfuscation techniques, advanced disguise, and variant generation techniques to bypass detection tools, this creates difficulties for security experts. However, the rich semantic information hidden in opcodes offers a promising way to distinguish benign applications from malicious ones. In this study, we propose a tool called JDroid that treats opcodes (Dalvik Opcode and Java ByteCode) as features based on static analysis. The proposed tool aims to detect malicious applications with a unique ensemble model in a stacked generalised structure that uses different opcode sequences as a hybrid, and where each feature is first trained separately and then used by an ensemble decision. For this purpose, opcodes are extracted from APK files by code analysis and directly converted into vectors as 0 and 1 according to their usage cases. A subset of 461 features, obtained through filtering and feature selection processes, is then created using fewer features. This increases efficiency and performance, avoids overfitting, and reduces computational cost. The datasets Drebin, Genome, MalDroid2020, CICInvesAndMal2019, and Omer are tested with an application pool consisting of 14 thousand applications, and the classification performance is compared with different machine learning methods. Experimental results show that the proposed approach has an accuracy value of 98.6% and an area under the curve (AUC) value of 99.6% in malware detection without being affected by the obfuscation process.
format Article
id doaj-art-5a99c376e93a40dab020df55d612e5be
institution Kabale University
issn 2376-5992
language English
publishDate 2025-07-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-5a99c376e93a40dab020df55d612e5be2025-08-20T03:35:46ZengPeerJ Inc.PeerJ Computer Science2376-59922025-07-0111e305110.7717/peerj-cs.3051JDroid: Android malware detection using hybrid opcode feature vectorRecep Sinan ArslanThe rapid proliferation of devices using the Android operating system makes these devices the primary target for malware developers. Researchers are investigating different techniques to protect end users from these attackers. While many of these techniques are successful in detecting malware, they also have some limitations. Because many applications today use advanced obfuscation techniques, advanced disguise, and variant generation techniques to bypass detection tools, this creates difficulties for security experts. However, the rich semantic information hidden in opcodes offers a promising way to distinguish benign applications from malicious ones. In this study, we propose a tool called JDroid that treats opcodes (Dalvik Opcode and Java ByteCode) as features based on static analysis. The proposed tool aims to detect malicious applications with a unique ensemble model in a stacked generalised structure that uses different opcode sequences as a hybrid, and where each feature is first trained separately and then used by an ensemble decision. For this purpose, opcodes are extracted from APK files by code analysis and directly converted into vectors as 0 and 1 according to their usage cases. A subset of 461 features, obtained through filtering and feature selection processes, is then created using fewer features. This increases efficiency and performance, avoids overfitting, and reduces computational cost. The datasets Drebin, Genome, MalDroid2020, CICInvesAndMal2019, and Omer are tested with an application pool consisting of 14 thousand applications, and the classification performance is compared with different machine learning methods. Experimental results show that the proposed approach has an accuracy value of 98.6% and an area under the curve (AUC) value of 99.6% in malware detection without being affected by the obfuscation process.https://peerj.com/articles/cs-3051.pdfMalware detectionOpcode sequencesHybrid feature vectorStacked generalized ensemble classifier
spellingShingle Recep Sinan Arslan
JDroid: Android malware detection using hybrid opcode feature vector
PeerJ Computer Science
Malware detection
Opcode sequences
Hybrid feature vector
Stacked generalized ensemble classifier
title JDroid: Android malware detection using hybrid opcode feature vector
title_full JDroid: Android malware detection using hybrid opcode feature vector
title_fullStr JDroid: Android malware detection using hybrid opcode feature vector
title_full_unstemmed JDroid: Android malware detection using hybrid opcode feature vector
title_short JDroid: Android malware detection using hybrid opcode feature vector
title_sort jdroid android malware detection using hybrid opcode feature vector
topic Malware detection
Opcode sequences
Hybrid feature vector
Stacked generalized ensemble classifier
url https://peerj.com/articles/cs-3051.pdf
work_keys_str_mv AT recepsinanarslan jdroidandroidmalwaredetectionusinghybridopcodefeaturevector