An application of textual document classification for Arabic governmental correspondence

The automation of classifying Arabic documents is becoming increasingly in demand, especially when dealing with an ever-growing amount of linguistic data. Natural language processing (NLP) has recently become one of the most significant fields in artificial intelligence (AI) thanks to recent advance...

Full description

Saved in:

Bibliographic Details
Format:	Article
Language:	English
Published:	Elsevier 2025-01-01
Series:	Kuwait Journal of Science
Subjects:	bert contrastive learning document classification transfer learning
Online Access:	https://www.sciencedirect.com/science/article/pii/S230741082400124X
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849329081164759040
collection	DOAJ
description	The automation of classifying Arabic documents is becoming increasingly in demand, especially when dealing with an ever-growing amount of linguistic data. Natural language processing (NLP) has recently become one of the most significant fields in artificial intelligence (AI) thanks to recent advances in introducing transformer-based models. Transformers facilitate the use of reusable models by using pre-trained models (PTMs). This study aims to fine-tune monolingual (AraBERT (Antoun et al., 2020)), bilingual (GigaBERT (Lan et al., 2020)), and multilingual (XLM-RoBERTa (Conneau et al., 2020)) transformer-based encoder models to classify official Arabic correspondence in pre-defined classes and compare their predictive performance in terms of accuracy, using a new balanced dataset. The new balanced dataset has 22,741 Arabic texts and is categorized into six categories labeled with the most common ministries’ names. The results in this study show that GigaBERT achieved the highest accuracy rate of 98%. The implemented models may contribute to the domain of information systems (ISs) to facilitate the classification process in ministries without human intervention. © 2024 The Author(s)
format	Article
id	doaj-art-46c075ae762643f994e91e565df376e6
institution	Kabale University
issn	2307-4108 2307-4116
language	English
publishDate	2025-01-01
publisher	Elsevier
record_format	Article
series	Kuwait Journal of Science
spelling	doaj-art-46c075ae762643f994e91e565df376e62025-08-20T03:47:21ZengElsevierKuwait Journal of Science2307-41082307-41162025-01-0152110029910.1016/j.kjs.2024.100299An application of textual document classification for Arabic governmental correspondenceThe automation of classifying Arabic documents is becoming increasingly in demand, especially when dealing with an ever-growing amount of linguistic data. Natural language processing (NLP) has recently become one of the most significant fields in artificial intelligence (AI) thanks to recent advances in introducing transformer-based models. Transformers facilitate the use of reusable models by using pre-trained models (PTMs). This study aims to fine-tune monolingual (AraBERT (Antoun et al., 2020)), bilingual (GigaBERT (Lan et al., 2020)), and multilingual (XLM-RoBERTa (Conneau et al., 2020)) transformer-based encoder models to classify official Arabic correspondence in pre-defined classes and compare their predictive performance in terms of accuracy, using a new balanced dataset. The new balanced dataset has 22,741 Arabic texts and is categorized into six categories labeled with the most common ministries’ names. The results in this study show that GigaBERT achieved the highest accuracy rate of 98%. The implemented models may contribute to the domain of information systems (ISs) to facilitate the classification process in ministries without human intervention. © 2024 The Author(s)https://www.sciencedirect.com/science/article/pii/S230741082400124Xbertcontrastive learningdocument classificationtransfer learning
spellingShingle	An application of textual document classification for Arabic governmental correspondence Kuwait Journal of Science bert contrastive learning document classification transfer learning
title	An application of textual document classification for Arabic governmental correspondence
title_full	An application of textual document classification for Arabic governmental correspondence
title_fullStr	An application of textual document classification for Arabic governmental correspondence
title_full_unstemmed	An application of textual document classification for Arabic governmental correspondence
title_short	An application of textual document classification for Arabic governmental correspondence
title_sort	application of textual document classification for arabic governmental correspondence
topic	bert contrastive learning document classification transfer learning
url	https://www.sciencedirect.com/science/article/pii/S230741082400124X

An application of textual document classification for Arabic governmental correspondence

Similar Items