Automating the search for legal information in Arabic: A novel approach to document retrieval

Objectives. The retrieval of legal information, including information related to issues such as punishment for crimes and felonies, represents a challenging task. The approach proposed in the article represents an efficient way to automate the retrieval of legal information without requiring a large...

Full description

Saved in:
Bibliographic Details
Main Authors: K. S. Jafar, A. A. Mohammad, A. H. Issa, A. V. Panov
Format: Article
Language:Russian
Published: MIREA - Russian Technological University 2024-10-01
Series:Российский технологический журнал
Subjects:
Online Access:https://www.rtj-mirea.ru/jour/article/view/977
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849249746242240512
author K. S. Jafar
A. A. Mohammad
A. H. Issa
A. V. Panov
author_facet K. S. Jafar
A. A. Mohammad
A. H. Issa
A. V. Panov
author_sort K. S. Jafar
collection DOAJ
description Objectives. The retrieval of legal information, including information related to issues such as punishment for crimes and felonies, represents a challenging task. The approach proposed in the article represents an efficient way to automate the retrieval of legal information without requiring a large amount of labeled data or consuming significant computational resources. The work set out to analyze the feasibility of a document retrieval approach in the context of Arabic legal texts using natural language processing and unsupervised clustering techniques.Methods. The Topic-to-Vector (Top2Vec) topic modeling algorithm for generating document embeddings based on semantic context is used to cluster Arabic legal texts into relevant topics. We also used the HDBSCAN densitybased clustering algorithm to identify subtopics within each cluster. Challenges of working with Arabic legal text, such as morphological complexity, ambiguity, and a lack of standardized terminology, are addressed by means of a proposed preprocessing pipeline that includes tokenization, normalization, stemming, and stop-word removal.Results. The results of the evaluation of the approach using a dataset of legal texts in Arabic based on keywords demonstrated its superior effectiveness in terms of accuracy and memorability. The proposed approach provides 87% accuracy and 80% completeness. This circumstance can significantly improve the search for legal documents, making the process faster and more accurate.Conclusions. Our findings suggest that this approach can be a valuable tool for legal professionals and researchers to navigate the complex landscape of Arabic legal information to improve efficiency and accuracy in legal information retrieval.
format Article
id doaj-art-0f2c126811094519894d314f3cd76045
institution Kabale University
issn 2782-3210
2500-316X
language Russian
publishDate 2024-10-01
publisher MIREA - Russian Technological University
record_format Article
series Российский технологический журнал
spelling doaj-art-0f2c126811094519894d314f3cd760452025-08-20T03:57:27ZrusMIREA - Russian Technological UniversityРоссийский технологический журнал2782-32102500-316X2024-10-0112571610.32362/2500-316X-2024-12-5-7-16445Automating the search for legal information in Arabic: A novel approach to document retrievalK. S. Jafar0A. A. Mohammad1A. H. Issa2A. V. Panov3MIREA – Russian Technological UniversityHSE UniversityRussian Biotechnological UniversityMIREA – Russian Technological UniversityObjectives. The retrieval of legal information, including information related to issues such as punishment for crimes and felonies, represents a challenging task. The approach proposed in the article represents an efficient way to automate the retrieval of legal information without requiring a large amount of labeled data or consuming significant computational resources. The work set out to analyze the feasibility of a document retrieval approach in the context of Arabic legal texts using natural language processing and unsupervised clustering techniques.Methods. The Topic-to-Vector (Top2Vec) topic modeling algorithm for generating document embeddings based on semantic context is used to cluster Arabic legal texts into relevant topics. We also used the HDBSCAN densitybased clustering algorithm to identify subtopics within each cluster. Challenges of working with Arabic legal text, such as morphological complexity, ambiguity, and a lack of standardized terminology, are addressed by means of a proposed preprocessing pipeline that includes tokenization, normalization, stemming, and stop-word removal.Results. The results of the evaluation of the approach using a dataset of legal texts in Arabic based on keywords demonstrated its superior effectiveness in terms of accuracy and memorability. The proposed approach provides 87% accuracy and 80% completeness. This circumstance can significantly improve the search for legal documents, making the process faster and more accurate.Conclusions. Our findings suggest that this approach can be a valuable tool for legal professionals and researchers to navigate the complex landscape of Arabic legal information to improve efficiency and accuracy in legal information retrieval.https://www.rtj-mirea.ru/jour/article/view/977search for documentsnlptop2vechdbscanarabic legal documentsword embeddingscosine similarity
spellingShingle K. S. Jafar
A. A. Mohammad
A. H. Issa
A. V. Panov
Automating the search for legal information in Arabic: A novel approach to document retrieval
Российский технологический журнал
search for documents
nlp
top2vec
hdbscan
arabic legal documents
word embeddings
cosine similarity
title Automating the search for legal information in Arabic: A novel approach to document retrieval
title_full Automating the search for legal information in Arabic: A novel approach to document retrieval
title_fullStr Automating the search for legal information in Arabic: A novel approach to document retrieval
title_full_unstemmed Automating the search for legal information in Arabic: A novel approach to document retrieval
title_short Automating the search for legal information in Arabic: A novel approach to document retrieval
title_sort automating the search for legal information in arabic a novel approach to document retrieval
topic search for documents
nlp
top2vec
hdbscan
arabic legal documents
word embeddings
cosine similarity
url https://www.rtj-mirea.ru/jour/article/view/977
work_keys_str_mv AT ksjafar automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval
AT aamohammad automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval
AT ahissa automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval
AT avpanov automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval