A composite Feature Selection Method to improve Classifying Imbalanced Big Data

Feature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the...

Full description

Saved in:
Bibliographic Details
Main Authors: Shaymaa Razoqi, Ghayda Al-Talib
Format: Article
Language:English
Published: Mosul University 2024-12-01
Series:Al-Rafidain Journal of Computer Sciences and Mathematics
Subjects:
Online Access:https://csmj.uomosul.edu.iq/article_185892_4c1c94cb38e386b8fb26bfb239ba8e34.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849692504016814080
author Shaymaa Razoqi
Ghayda Al-Talib
author_facet Shaymaa Razoqi
Ghayda Al-Talib
author_sort Shaymaa Razoqi
collection DOAJ
description Feature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the sampling distribution between the two data classes in the training set. To solve the imbalance problem, there are several methods used, some of which depend on redistributing the data and others of which depend on improving the classification algorithm itself. The feature selection can also affect the improvement of imbalanced data classification results when the features are chosen carefully. Therefore, this research proposed a composed feature selection method using the filter feature selection technique and permutation-based important features with the ensemble learning method. Three classifiers were used with three performance metrics (AUC, G-means, and F-score ) to show the effect of proposed feature selection method with imbalanced big data. The results of using proposed method led to improved classification on five standard imbalanced data sets.
format Article
id doaj-art-eace0be687924dd6b27525c10f1afaa7
institution DOAJ
issn 1815-4816
2311-7990
language English
publishDate 2024-12-01
publisher Mosul University
record_format Article
series Al-Rafidain Journal of Computer Sciences and Mathematics
spelling doaj-art-eace0be687924dd6b27525c10f1afaa72025-08-20T03:20:40ZengMosul UniversityAl-Rafidain Journal of Computer Sciences and Mathematics1815-48162311-79902024-12-01182708110.33899/csmj.2024.149115.1117185892A composite Feature Selection Method to improve Classifying Imbalanced Big DataShaymaa Razoqi0Ghayda Al-Talib1Department of Computer Science, College of Education for Pure Science, University of Mosul, Mosul, IraqDepartment of Computer Science, College of Computer Science and Mathematics, University of Mosul, Mosul, IRAQFeature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the sampling distribution between the two data classes in the training set. To solve the imbalance problem, there are several methods used, some of which depend on redistributing the data and others of which depend on improving the classification algorithm itself. The feature selection can also affect the improvement of imbalanced data classification results when the features are chosen carefully. Therefore, this research proposed a composed feature selection method using the filter feature selection technique and permutation-based important features with the ensemble learning method. Three classifiers were used with three performance metrics (AUC, G-means, and F-score ) to show the effect of proposed feature selection method with imbalanced big data. The results of using proposed method led to improved classification on five standard imbalanced data sets.https://csmj.uomosul.edu.iq/article_185892_4c1c94cb38e386b8fb26bfb239ba8e34.pdfimbalance databig datapermutation-based features importanceinformation gainensemble learning
spellingShingle Shaymaa Razoqi
Ghayda Al-Talib
A composite Feature Selection Method to improve Classifying Imbalanced Big Data
Al-Rafidain Journal of Computer Sciences and Mathematics
imbalance data
big data
permutation-based features importance
information gain
ensemble learning
title A composite Feature Selection Method to improve Classifying Imbalanced Big Data
title_full A composite Feature Selection Method to improve Classifying Imbalanced Big Data
title_fullStr A composite Feature Selection Method to improve Classifying Imbalanced Big Data
title_full_unstemmed A composite Feature Selection Method to improve Classifying Imbalanced Big Data
title_short A composite Feature Selection Method to improve Classifying Imbalanced Big Data
title_sort composite feature selection method to improve classifying imbalanced big data
topic imbalance data
big data
permutation-based features importance
information gain
ensemble learning
url https://csmj.uomosul.edu.iq/article_185892_4c1c94cb38e386b8fb26bfb239ba8e34.pdf
work_keys_str_mv AT shaymaarazoqi acompositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata
AT ghaydaaltalib acompositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata
AT shaymaarazoqi compositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata
AT ghaydaaltalib compositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata