The Effect of Data Conversion Methods (Naive Bayes, C5.0 & Support Vector Machine) on the Performance of Classification Algorithms in Data Mining

In the study, sample distributions (Normal, Chi-square, F), number of observations (100, 500, 1000, 10000) and class distribution rates (0.1, 0.2, 0.3, 0.4, 0.5) were evaluated. It was aimed to examine the effects of data transformation on naive Bayes (NB), C5.0 and support vector machines (SVM) by...

Full description

Saved in:
Bibliographic Details
Main Authors: Hussein Ali Attallah, Ahmed Al-Asadi, Doctor, Sadeer Sadeq
Format: Article
Language:English
Published: Institute of Technology and Education Galileo da Amazônia 2025-07-01
Series:ITEGAM-JETIA
Online Access:http://itegam-jetia.org/journal/index.php/jetia/article/view/1718
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the study, sample distributions (Normal, Chi-square, F), number of observations (100, 500, 1000, 10000) and class distribution rates (0.1, 0.2, 0.3, 0.4, 0.5) were evaluated. It was aimed to examine the effects of data transformation on naive Bayes (NB), C5.0 and support vector machines (SVM) by applying minimum-maximum and z-score normalisation and equal width and equal frequency spacing discrimination methods to different types of data produced by simulation. In this research, the minimum-maximum and z-score normalisation of the data produced by simulation from a normal distribution, chi-square distribution and F distribution according to four different numbers of observations and five different equilibrium distribution ratios of classes, and spacing discrimination transformations of equal girth (width) (EG) and equal frequency (EF). The results and comparative study showed that both normalisation and discrimination methods were influential in the performance of SVM and contributed to better results. According to the classification success achieved with SVM, normalisation methods are more effective in average, and chi-square distribution among both approaches, and EF unsupervised discrimination method is more effective in F-distribution.
ISSN:2447-0228