Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets

In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are mor...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanping Xu, Chunhua Wu, Kangfeng Zheng, Xinxin Niu, Yixian Yang
Format: Article
Language:English
Published: Wiley 2017-04-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/1550147717703116
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850223317532803072
author Yanping Xu
Chunhua Wu
Kangfeng Zheng
Xinxin Niu
Yixian Yang
author_facet Yanping Xu
Chunhua Wu
Kangfeng Zheng
Xinxin Niu
Yixian Yang
author_sort Yanping Xu
collection DOAJ
description In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy–synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversampling technique method. As the sample size of the majority class increases relative to that of the minority class, fuzzy–synthetic minority oversampling technique generates more synthetic examples for each minority class examples in the fuzzy region, where the minority examples have a low degree of membership to the minority class and are more likely to be misclassified. Using the new synthetic examples, the classifiers build larger decision regions that contain more minority examples, and they are no longer biased to the majority class. Compared with synthetic minority oversampling technique and Borderline–synthetic minority oversampling technique methods, fuzzy–synthetic minority oversampling technique achieves higher accuracy on both the minority class and the entire datasets.
format Article
id doaj-art-5d6b763dcfd54571a83cdccfb9dd3cd4
institution OA Journals
issn 1550-1477
language English
publishDate 2017-04-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-5d6b763dcfd54571a83cdccfb9dd3cd42025-08-20T02:06:00ZengWileyInternational Journal of Distributed Sensor Networks1550-14772017-04-011310.1177/1550147717703116Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasetsYanping XuChunhua WuKangfeng ZhengXinxin NiuYixian YangIn previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy–synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversampling technique method. As the sample size of the majority class increases relative to that of the minority class, fuzzy–synthetic minority oversampling technique generates more synthetic examples for each minority class examples in the fuzzy region, where the minority examples have a low degree of membership to the minority class and are more likely to be misclassified. Using the new synthetic examples, the classifiers build larger decision regions that contain more minority examples, and they are no longer biased to the majority class. Compared with synthetic minority oversampling technique and Borderline–synthetic minority oversampling technique methods, fuzzy–synthetic minority oversampling technique achieves higher accuracy on both the minority class and the entire datasets.https://doi.org/10.1177/1550147717703116
spellingShingle Yanping Xu
Chunhua Wu
Kangfeng Zheng
Xinxin Niu
Yixian Yang
Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
International Journal of Distributed Sensor Networks
title Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
title_full Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
title_fullStr Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
title_full_unstemmed Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
title_short Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
title_sort fuzzy synthetic minority oversampling technique oversampling based on fuzzy set theory for android malware detection in imbalanced datasets
url https://doi.org/10.1177/1550147717703116
work_keys_str_mv AT yanpingxu fuzzysyntheticminorityoversamplingtechniqueoversamplingbasedonfuzzysettheoryforandroidmalwaredetectioninimbalanceddatasets
AT chunhuawu fuzzysyntheticminorityoversamplingtechniqueoversamplingbasedonfuzzysettheoryforandroidmalwaredetectioninimbalanceddatasets
AT kangfengzheng fuzzysyntheticminorityoversamplingtechniqueoversamplingbasedonfuzzysettheoryforandroidmalwaredetectioninimbalanceddatasets
AT xinxinniu fuzzysyntheticminorityoversamplingtechniqueoversamplingbasedonfuzzysettheoryforandroidmalwaredetectioninimbalanceddatasets
AT yixianyang fuzzysyntheticminorityoversamplingtechniqueoversamplingbasedonfuzzysettheoryforandroidmalwaredetectioninimbalanceddatasets