Advancing arabic dialect detection with hybrid stacked transformer models

The rapid expansion of dialectally unique Arabic material on social media and the internet highlights how important it is to categorize dialects accurately to maximize a variety of Natural Language Processing (NLP) applications. The improvement in classification performance highlights the wider vari...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hager Saleh, Abdulaziz AlMohimeed, Rasha Hassan, Mandour M. Ibrahim, Saeed Hamood Alsamhi, Moatamad Refaat Hassan, Sherif Mostafa
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-02-01
Series:	Frontiers in Human Neuroscience
Subjects:	Arabic dialects Bert-Base-Arabertv02 Dialectal-Arabic-XLM-R-Base transformer Knowledge representation NLP
Online Access:	https://www.frontiersin.org/articles/10.3389/fnhum.2025.1498297/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823859189135441920
author	Hager Saleh Hager Saleh Hager Saleh Abdulaziz AlMohimeed Rasha Hassan Mandour M. Ibrahim Saeed Hamood Alsamhi Moatamad Refaat Hassan Sherif Mostafa
author_facet	Hager Saleh Hager Saleh Hager Saleh Abdulaziz AlMohimeed Rasha Hassan Mandour M. Ibrahim Saeed Hamood Alsamhi Moatamad Refaat Hassan Sherif Mostafa
author_sort	Hager Saleh
collection	DOAJ
description	The rapid expansion of dialectally unique Arabic material on social media and the internet highlights how important it is to categorize dialects accurately to maximize a variety of Natural Language Processing (NLP) applications. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications. Recent advances in deep learning (DL) models have shown promise in overcoming potential challenges in identifying Arabic dialects. In this paper, we propose a novel stacking model based on two transformer models, i.e., Bert-Base-Arabertv02 and Dialectal-Arabic-XLM-R-Base, to enhance the classification of dialectal Arabic. The proposed model consists of two levels, including base models and meta-learners. In the proposed model, Level 1 generates class probabilities from two transformer models for training and testing sets, which are then used in Level 2 to train and evaluate a meta-learner. The stacking model compares various models, including long-short-term memory (LSTM), gated recurrent units (GRU), convolutional neural network (CNN), and two transformer models using different word embedding. The results show that the stacking model combination of two models archives outperformance over single-model approaches due to capturing a broader range of linguistic features, which leads to better generalization across different forms of Arabic. The proposed model is evaluated based on the performance of IADD and Shami. For Shami, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 89.73 accuracy, 89.596 precision, 89.73 recall, and 89.574 F1-score. For IADD, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 93.062 accuracy, 93.368 precision, 93.062 recall, and 93.184 F1 score. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications.
format	Article
id	doaj-art-88ddf5771905431abda778023cf47a3c
institution	Kabale University
issn	1662-5161
language	English
publishDate	2025-02-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Human Neuroscience
spelling	doaj-art-88ddf5771905431abda778023cf47a3c2025-02-11T06:59:47ZengFrontiers Media S.A.Frontiers in Human Neuroscience1662-51612025-02-011910.3389/fnhum.2025.14982971498297Advancing arabic dialect detection with hybrid stacked transformer modelsHager Saleh0Hager Saleh1Hager Saleh2Abdulaziz AlMohimeed3Rasha Hassan4Mandour M. Ibrahim5Saeed Hamood Alsamhi6Moatamad Refaat Hassan7Sherif Mostafa8Faculty of Computers and Artificial Intelligence, Hurghada University, Hurghada, EgyptInsight SFI Research Centre for Data Analytics, School of Engineering, University of Galway, Galway, IrelandAtlantic Technological University, Letterkenny, IrelandComputer Science Department College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi ArabiaDepartment of Computer Science, Faculty of Science, Aswan University, Aswan, EgyptInformation Technology Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi ArabiaDepartment of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of KoreaDepartment of Computer Science, Faculty of Science, Aswan University, Aswan, EgyptFaculty of Computers and Artificial Intelligence, Hurghada University, Hurghada, EgyptThe rapid expansion of dialectally unique Arabic material on social media and the internet highlights how important it is to categorize dialects accurately to maximize a variety of Natural Language Processing (NLP) applications. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications. Recent advances in deep learning (DL) models have shown promise in overcoming potential challenges in identifying Arabic dialects. In this paper, we propose a novel stacking model based on two transformer models, i.e., Bert-Base-Arabertv02 and Dialectal-Arabic-XLM-R-Base, to enhance the classification of dialectal Arabic. The proposed model consists of two levels, including base models and meta-learners. In the proposed model, Level 1 generates class probabilities from two transformer models for training and testing sets, which are then used in Level 2 to train and evaluate a meta-learner. The stacking model compares various models, including long-short-term memory (LSTM), gated recurrent units (GRU), convolutional neural network (CNN), and two transformer models using different word embedding. The results show that the stacking model combination of two models archives outperformance over single-model approaches due to capturing a broader range of linguistic features, which leads to better generalization across different forms of Arabic. The proposed model is evaluated based on the performance of IADD and Shami. For Shami, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 89.73 accuracy, 89.596 precision, 89.73 recall, and 89.574 F1-score. For IADD, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 93.062 accuracy, 93.368 precision, 93.062 recall, and 93.184 F1 score. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications.https://www.frontiersin.org/articles/10.3389/fnhum.2025.1498297/fullArabic dialectsBert-Base-Arabertv02Dialectal-Arabic-XLM-R-BasetransformerKnowledge representationNLP
spellingShingle	Hager Saleh Hager Saleh Hager Saleh Abdulaziz AlMohimeed Rasha Hassan Mandour M. Ibrahim Saeed Hamood Alsamhi Moatamad Refaat Hassan Sherif Mostafa Advancing arabic dialect detection with hybrid stacked transformer models Frontiers in Human Neuroscience Arabic dialects Bert-Base-Arabertv02 Dialectal-Arabic-XLM-R-Base transformer Knowledge representation NLP
title	Advancing arabic dialect detection with hybrid stacked transformer models
title_full	Advancing arabic dialect detection with hybrid stacked transformer models
title_fullStr	Advancing arabic dialect detection with hybrid stacked transformer models
title_full_unstemmed	Advancing arabic dialect detection with hybrid stacked transformer models
title_short	Advancing arabic dialect detection with hybrid stacked transformer models
title_sort	advancing arabic dialect detection with hybrid stacked transformer models
topic	Arabic dialects Bert-Base-Arabertv02 Dialectal-Arabic-XLM-R-Base transformer Knowledge representation NLP
url	https://www.frontiersin.org/articles/10.3389/fnhum.2025.1498297/full
work_keys_str_mv	AT hagersaleh advancingarabicdialectdetectionwithhybridstackedtransformermodels AT hagersaleh advancingarabicdialectdetectionwithhybridstackedtransformermodels AT hagersaleh advancingarabicdialectdetectionwithhybridstackedtransformermodels AT abdulazizalmohimeed advancingarabicdialectdetectionwithhybridstackedtransformermodels AT rashahassan advancingarabicdialectdetectionwithhybridstackedtransformermodels AT mandourmibrahim advancingarabicdialectdetectionwithhybridstackedtransformermodels AT saeedhamoodalsamhi advancingarabicdialectdetectionwithhybridstackedtransformermodels AT moatamadrefaathassan advancingarabicdialectdetectionwithhybridstackedtransformermodels AT sherifmostafa advancingarabicdialectdetectionwithhybridstackedtransformermodels

Advancing arabic dialect detection with hybrid stacked transformer models

Similar Items