Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance

Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. A...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fauzi Adi Rafrastara, Wildanil Ghozi, Ramadhan Rakhmat Sani, Lekso Budi Handoko, Abdussalam Abdussalam, Elkaf Rahmawan Pramudya, Faizal M. Abdollah
Format:	Article
Language:	English
Published:	UUM Press 2025-01-01
Series:	Journal of ICT
Subjects:	Malware detection IGCS feature selection Information Gain Chi-Square
Online Access:	https://e-journal.uum.edu.my/index.php/jict/article/view/25567
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates Information Gain with Chi-Square (X²) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers—Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Naïve Bayes—achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi-Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time.
ISSN:	1675-414X 2180-3862

Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance

Similar Items