A composite Feature Selection Method to improve Classifying Imbalanced Big Data
Feature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Mosul University
2024-12-01
|
| Series: | Al-Rafidain Journal of Computer Sciences and Mathematics |
| Subjects: | |
| Online Access: | https://csmj.uomosul.edu.iq/article_185892_4c1c94cb38e386b8fb26bfb239ba8e34.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849692504016814080 |
|---|---|
| author | Shaymaa Razoqi Ghayda Al-Talib |
| author_facet | Shaymaa Razoqi Ghayda Al-Talib |
| author_sort | Shaymaa Razoqi |
| collection | DOAJ |
| description | Feature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the sampling distribution between the two data classes in the training set. To solve the imbalance problem, there are several methods used, some of which depend on redistributing the data and others of which depend on improving the classification algorithm itself. The feature selection can also affect the improvement of imbalanced data classification results when the features are chosen carefully. Therefore, this research proposed a composed feature selection method using the filter feature selection technique and permutation-based important features with the ensemble learning method. Three classifiers were used with three performance metrics (AUC, G-means, and F-score ) to show the effect of proposed feature selection method with imbalanced big data. The results of using proposed method led to improved classification on five standard imbalanced data sets. |
| format | Article |
| id | doaj-art-eace0be687924dd6b27525c10f1afaa7 |
| institution | DOAJ |
| issn | 1815-4816 2311-7990 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Mosul University |
| record_format | Article |
| series | Al-Rafidain Journal of Computer Sciences and Mathematics |
| spelling | doaj-art-eace0be687924dd6b27525c10f1afaa72025-08-20T03:20:40ZengMosul UniversityAl-Rafidain Journal of Computer Sciences and Mathematics1815-48162311-79902024-12-01182708110.33899/csmj.2024.149115.1117185892A composite Feature Selection Method to improve Classifying Imbalanced Big DataShaymaa Razoqi0Ghayda Al-Talib1Department of Computer Science, College of Education for Pure Science, University of Mosul, Mosul, IraqDepartment of Computer Science, College of Computer Science and Mathematics, University of Mosul, Mosul, IRAQFeature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the sampling distribution between the two data classes in the training set. To solve the imbalance problem, there are several methods used, some of which depend on redistributing the data and others of which depend on improving the classification algorithm itself. The feature selection can also affect the improvement of imbalanced data classification results when the features are chosen carefully. Therefore, this research proposed a composed feature selection method using the filter feature selection technique and permutation-based important features with the ensemble learning method. Three classifiers were used with three performance metrics (AUC, G-means, and F-score ) to show the effect of proposed feature selection method with imbalanced big data. The results of using proposed method led to improved classification on five standard imbalanced data sets.https://csmj.uomosul.edu.iq/article_185892_4c1c94cb38e386b8fb26bfb239ba8e34.pdfimbalance databig datapermutation-based features importanceinformation gainensemble learning |
| spellingShingle | Shaymaa Razoqi Ghayda Al-Talib A composite Feature Selection Method to improve Classifying Imbalanced Big Data Al-Rafidain Journal of Computer Sciences and Mathematics imbalance data big data permutation-based features importance information gain ensemble learning |
| title | A composite Feature Selection Method to improve Classifying Imbalanced Big Data |
| title_full | A composite Feature Selection Method to improve Classifying Imbalanced Big Data |
| title_fullStr | A composite Feature Selection Method to improve Classifying Imbalanced Big Data |
| title_full_unstemmed | A composite Feature Selection Method to improve Classifying Imbalanced Big Data |
| title_short | A composite Feature Selection Method to improve Classifying Imbalanced Big Data |
| title_sort | composite feature selection method to improve classifying imbalanced big data |
| topic | imbalance data big data permutation-based features importance information gain ensemble learning |
| url | https://csmj.uomosul.edu.iq/article_185892_4c1c94cb38e386b8fb26bfb239ba8e34.pdf |
| work_keys_str_mv | AT shaymaarazoqi acompositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata AT ghaydaaltalib acompositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata AT shaymaarazoqi compositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata AT ghaydaaltalib compositefeatureselectionmethodtoimproveclassifyingimbalancedbigdata |