The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection

Analyzing dialects in the Kurdish language proves to be tough because of the tiny phonetic distinctions among the dialects. We applied advanced methods to enhance the precision of Kurdish dialect classification in this research. We examined the dataset’s stability and variation through the use of t...

Full description

Saved in:
Bibliographic Details
Main Authors: Karzan J. Ghafoor, Sarkhel H. Karim, Karwan M. Hama Rawf, Ayub O. Abdulrahman
Format: Article
Language:English
Published: Koya University 2025-03-01
Series:ARO-The Scientific Journal of Koya University
Subjects:
Online Access:https://aro.koyauniversity.org/index.php/aro/article/view/1897
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850252085264646144
author Karzan J. Ghafoor
Sarkhel H. Karim
Karwan M. Hama Rawf
Ayub O. Abdulrahman
author_facet Karzan J. Ghafoor
Sarkhel H. Karim
Karwan M. Hama Rawf
Ayub O. Abdulrahman
author_sort Karzan J. Ghafoor
collection DOAJ
description Analyzing dialects in the Kurdish language proves to be tough because of the tiny phonetic distinctions among the dialects. We applied advanced methods to enhance the precision of Kurdish dialect classification in this research. We examined the dataset’s stability and variation through the use of time-stretching and noise-augmenting methods. Analysis of variance (ANOVA) filter approach is applied to improve feature selection (FS) more efficiently and highlight the most relevant features for dialect classification. The ANOVA filter method ranks features based on the means from different dialect groups, which made FS better. To make dialect classification work better, a 1D convolutional neural network model was given a dataset that had ANOVA FS added to it. The model showed a very strong performance, reaching a remarkable accuracy of 99.42%. This noteworthy increase in accuracy beat former research with an accuracy of 95.5%. The findings demonstrate how combining time stretch and FS methods can improve the accuracy of Kurdish dialect classification. This project improves our understanding and implementation of machine learning in the field of linguistic diversity and dialectology.
format Article
id doaj-art-b6d24b8a6f574bc0944a156270463259
institution OA Journals
issn 2410-9355
2307-549X
language English
publishDate 2025-03-01
publisher Koya University
record_format Article
series ARO-The Scientific Journal of Koya University
spelling doaj-art-b6d24b8a6f574bc0944a1562704632592025-08-20T01:57:44ZengKoya UniversityARO-The Scientific Journal of Koya University2410-93552307-549X2025-03-0113110.14500/aro.11897The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature SelectionKarzan J. Ghafoor0Sarkhel H. Karim1Karwan M. Hama Rawf2Ayub O. Abdulrahman3Computer Science Department, College of Science, University of Halabja, Halabja, 46018, Kurdistan Region - F.R. IraqComputer Science Department, College of Science, University of Halabja, Halabja, 46018, Kurdistan Region - F.R. IraqComputer Science Department, College of Science, University of Halabja, Halabja, 46018, Kurdistan Region - F.R. IraqComputer Science Department, College of Science, University of Halabja, Halabja, 46018, Kurdistan Region - F.R. Iraq Analyzing dialects in the Kurdish language proves to be tough because of the tiny phonetic distinctions among the dialects. We applied advanced methods to enhance the precision of Kurdish dialect classification in this research. We examined the dataset’s stability and variation through the use of time-stretching and noise-augmenting methods. Analysis of variance (ANOVA) filter approach is applied to improve feature selection (FS) more efficiently and highlight the most relevant features for dialect classification. The ANOVA filter method ranks features based on the means from different dialect groups, which made FS better. To make dialect classification work better, a 1D convolutional neural network model was given a dataset that had ANOVA FS added to it. The model showed a very strong performance, reaching a remarkable accuracy of 99.42%. This noteworthy increase in accuracy beat former research with an accuracy of 95.5%. The findings demonstrate how combining time stretch and FS methods can improve the accuracy of Kurdish dialect classification. This project improves our understanding and implementation of machine learning in the field of linguistic diversity and dialectology. https://aro.koyauniversity.org/index.php/aro/article/view/18971D convolutional neural networkData augmentationFeature selectionKurdish dialect identificationSound feature
spellingShingle Karzan J. Ghafoor
Sarkhel H. Karim
Karwan M. Hama Rawf
Ayub O. Abdulrahman
The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection
ARO-The Scientific Journal of Koya University
1D convolutional neural network
Data augmentation
Feature selection
Kurdish dialect identification
Sound feature
title The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection
title_full The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection
title_fullStr The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection
title_full_unstemmed The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection
title_short The Improved Kurdish Dialect Classification Using Data Augmentation and ANOVA-Based Feature Selection
title_sort improved kurdish dialect classification using data augmentation and anova based feature selection
topic 1D convolutional neural network
Data augmentation
Feature selection
Kurdish dialect identification
Sound feature
url https://aro.koyauniversity.org/index.php/aro/article/view/1897
work_keys_str_mv AT karzanjghafoor theimprovedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT sarkhelhkarim theimprovedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT karwanmhamarawf theimprovedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT ayuboabdulrahman theimprovedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT karzanjghafoor improvedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT sarkhelhkarim improvedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT karwanmhamarawf improvedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection
AT ayuboabdulrahman improvedkurdishdialectclassificationusingdataaugmentationandanovabasedfeatureselection