Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. A...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
UUM Press
2025-01-01
|
Series: | Journal of ICT |
Subjects: | |
Online Access: | https://e-journal.uum.edu.my/index.php/jict/article/view/25567 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates Information Gain with Chi-Square (X²) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers—Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Naïve Bayes—achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi-Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time.
|
---|---|
ISSN: | 1675-414X 2180-3862 |