An application of textual document classification for Arabic governmental correspondence

The automation of classifying Arabic documents is becoming increasingly in demand, especially when dealing with an ever-growing amount of linguistic data. Natural language processing (NLP) has recently become one of the most significant fields in artificial intelligence (AI) thanks to recent advance...

Full description

Saved in:
Bibliographic Details
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Kuwait Journal of Science
Subjects:
Online Access:https://www.sciencedirect.com/science/article/pii/S230741082400124X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849329081164759040
collection DOAJ
description The automation of classifying Arabic documents is becoming increasingly in demand, especially when dealing with an ever-growing amount of linguistic data. Natural language processing (NLP) has recently become one of the most significant fields in artificial intelligence (AI) thanks to recent advances in introducing transformer-based models. Transformers facilitate the use of reusable models by using pre-trained models (PTMs). This study aims to fine-tune monolingual (AraBERT (Antoun et al., 2020)), bilingual (GigaBERT (Lan et al., 2020)), and multilingual (XLM-RoBERTa (Conneau et al., 2020)) transformer-based encoder models to classify official Arabic correspondence in pre-defined classes and compare their predictive performance in terms of accuracy, using a new balanced dataset. The new balanced dataset has 22,741 Arabic texts and is categorized into six categories labeled with the most common ministries’ names. The results in this study show that GigaBERT achieved the highest accuracy rate of 98%. The implemented models may contribute to the domain of information systems (ISs) to facilitate the classification process in ministries without human intervention. © 2024 The Author(s)
format Article
id doaj-art-46c075ae762643f994e91e565df376e6
institution Kabale University
issn 2307-4108
2307-4116
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Kuwait Journal of Science
spelling doaj-art-46c075ae762643f994e91e565df376e62025-08-20T03:47:21ZengElsevierKuwait Journal of Science2307-41082307-41162025-01-0152110029910.1016/j.kjs.2024.100299An application of textual document classification for Arabic governmental correspondenceThe automation of classifying Arabic documents is becoming increasingly in demand, especially when dealing with an ever-growing amount of linguistic data. Natural language processing (NLP) has recently become one of the most significant fields in artificial intelligence (AI) thanks to recent advances in introducing transformer-based models. Transformers facilitate the use of reusable models by using pre-trained models (PTMs). This study aims to fine-tune monolingual (AraBERT (Antoun et al., 2020)), bilingual (GigaBERT (Lan et al., 2020)), and multilingual (XLM-RoBERTa (Conneau et al., 2020)) transformer-based encoder models to classify official Arabic correspondence in pre-defined classes and compare their predictive performance in terms of accuracy, using a new balanced dataset. The new balanced dataset has 22,741 Arabic texts and is categorized into six categories labeled with the most common ministries’ names. The results in this study show that GigaBERT achieved the highest accuracy rate of 98%. The implemented models may contribute to the domain of information systems (ISs) to facilitate the classification process in ministries without human intervention. © 2024 The Author(s)https://www.sciencedirect.com/science/article/pii/S230741082400124Xbertcontrastive learningdocument classificationtransfer learning
spellingShingle An application of textual document classification for Arabic governmental correspondence
Kuwait Journal of Science
bert
contrastive learning
document classification
transfer learning
title An application of textual document classification for Arabic governmental correspondence
title_full An application of textual document classification for Arabic governmental correspondence
title_fullStr An application of textual document classification for Arabic governmental correspondence
title_full_unstemmed An application of textual document classification for Arabic governmental correspondence
title_short An application of textual document classification for Arabic governmental correspondence
title_sort application of textual document classification for arabic governmental correspondence
topic bert
contrastive learning
document classification
transfer learning
url https://www.sciencedirect.com/science/article/pii/S230741082400124X