Assessing BERT-based models for Arabic and low-resource languages in crime text classification

The bidirectional encoder representations from Transformers (BERT) has recently attracted considerable attention from researchers and practitioners, demonstrating notable effectiveness in various natural language processing (NLP) tasks, including text classification. This efficacy can be attributed...

Full description

Saved in:

Bibliographic Details
Main Authors:	Njood K. Al-harbi, Manal Alghieth
Format:	Article
Language:	English
Published:	PeerJ Inc. 2025-07-01
Series:	PeerJ Computer Science
Subjects:	Artificial intelligence Deep learning Transformer BERT Text classification Crime classification
Online Access:	https://peerj.com/articles/cs-3017.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849409232716169216
author	Njood K. Al-harbi Manal Alghieth
author_facet	Njood K. Al-harbi Manal Alghieth
author_sort	Njood K. Al-harbi
collection	DOAJ
description	The bidirectional encoder representations from Transformers (BERT) has recently attracted considerable attention from researchers and practitioners, demonstrating notable effectiveness in various natural language processing (NLP) tasks, including text classification. This efficacy can be attributed to its unique architectural features, particularly its ability to process text using both left and right context, having been pre-trained on extensive datasets. In the context of the criminal domain, the classification of data is a crucial activity, and Transformers are increasingly recognized for their potential to support law enforcement efforts. BERT has been released in English and Chinese, as well as a multilingual version that accommodates over 100 languages. However, there is a pressing need to analyze the availability and performance of BERT in Arabic and other low-resource languages. This study primarily focuses on analyzing BERT-based models tailored for the Arabic language; however, due to the limited number of existing studies in this area, the research extends to include other low-resource languages. The study evaluates these models’ performance in comparison to machine learning (ML), deep learning (DL), and other Transformer models. Furthermore, it assesses the availability of relevant data and examines the effectiveness of BERT-based models in low-resource linguistic contexts. The study concludes with recommendations for future research directions, supported by empirical statistical evidence.
format	Article
id	doaj-art-b9b34b8a8ef8485da1c5b8f7f5b763ea
institution	Kabale University
issn	2376-5992
language	English
publishDate	2025-07-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj-art-b9b34b8a8ef8485da1c5b8f7f5b763ea2025-08-20T03:35:33ZengPeerJ Inc.PeerJ Computer Science2376-59922025-07-0111e301710.7717/peerj-cs.3017Assessing BERT-based models for Arabic and low-resource languages in crime text classificationNjood K. Al-harbiManal AlghiethThe bidirectional encoder representations from Transformers (BERT) has recently attracted considerable attention from researchers and practitioners, demonstrating notable effectiveness in various natural language processing (NLP) tasks, including text classification. This efficacy can be attributed to its unique architectural features, particularly its ability to process text using both left and right context, having been pre-trained on extensive datasets. In the context of the criminal domain, the classification of data is a crucial activity, and Transformers are increasingly recognized for their potential to support law enforcement efforts. BERT has been released in English and Chinese, as well as a multilingual version that accommodates over 100 languages. However, there is a pressing need to analyze the availability and performance of BERT in Arabic and other low-resource languages. This study primarily focuses on analyzing BERT-based models tailored for the Arabic language; however, due to the limited number of existing studies in this area, the research extends to include other low-resource languages. The study evaluates these models’ performance in comparison to machine learning (ML), deep learning (DL), and other Transformer models. Furthermore, it assesses the availability of relevant data and examines the effectiveness of BERT-based models in low-resource linguistic contexts. The study concludes with recommendations for future research directions, supported by empirical statistical evidence.https://peerj.com/articles/cs-3017.pdfArtificial intelligenceDeep learningTransformerBERTText classificationCrime classification
spellingShingle	Njood K. Al-harbi Manal Alghieth Assessing BERT-based models for Arabic and low-resource languages in crime text classification PeerJ Computer Science Artificial intelligence Deep learning Transformer BERT Text classification Crime classification
title	Assessing BERT-based models for Arabic and low-resource languages in crime text classification
title_full	Assessing BERT-based models for Arabic and low-resource languages in crime text classification
title_fullStr	Assessing BERT-based models for Arabic and low-resource languages in crime text classification
title_full_unstemmed	Assessing BERT-based models for Arabic and low-resource languages in crime text classification
title_short	Assessing BERT-based models for Arabic and low-resource languages in crime text classification
title_sort	assessing bert based models for arabic and low resource languages in crime text classification
topic	Artificial intelligence Deep learning Transformer BERT Text classification Crime classification
url	https://peerj.com/articles/cs-3017.pdf
work_keys_str_mv	AT njoodkalharbi assessingbertbasedmodelsforarabicandlowresourcelanguagesincrimetextclassification AT manalalghieth assessingbertbasedmodelsforarabicandlowresourcelanguagesincrimetextclassification

Assessing BERT-based models for Arabic and low-resource languages in crime text classification

Similar Items