Automating the search for legal information in Arabic: A novel approach to document retrieval
Objectives. The retrieval of legal information, including information related to issues such as punishment for crimes and felonies, represents a challenging task. The approach proposed in the article represents an efficient way to automate the retrieval of legal information without requiring a large...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | Russian |
| Published: |
MIREA - Russian Technological University
2024-10-01
|
| Series: | Российский технологический журнал |
| Subjects: | |
| Online Access: | https://www.rtj-mirea.ru/jour/article/view/977 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849249746242240512 |
|---|---|
| author | K. S. Jafar A. A. Mohammad A. H. Issa A. V. Panov |
| author_facet | K. S. Jafar A. A. Mohammad A. H. Issa A. V. Panov |
| author_sort | K. S. Jafar |
| collection | DOAJ |
| description | Objectives. The retrieval of legal information, including information related to issues such as punishment for crimes and felonies, represents a challenging task. The approach proposed in the article represents an efficient way to automate the retrieval of legal information without requiring a large amount of labeled data or consuming significant computational resources. The work set out to analyze the feasibility of a document retrieval approach in the context of Arabic legal texts using natural language processing and unsupervised clustering techniques.Methods. The Topic-to-Vector (Top2Vec) topic modeling algorithm for generating document embeddings based on semantic context is used to cluster Arabic legal texts into relevant topics. We also used the HDBSCAN densitybased clustering algorithm to identify subtopics within each cluster. Challenges of working with Arabic legal text, such as morphological complexity, ambiguity, and a lack of standardized terminology, are addressed by means of a proposed preprocessing pipeline that includes tokenization, normalization, stemming, and stop-word removal.Results. The results of the evaluation of the approach using a dataset of legal texts in Arabic based on keywords demonstrated its superior effectiveness in terms of accuracy and memorability. The proposed approach provides 87% accuracy and 80% completeness. This circumstance can significantly improve the search for legal documents, making the process faster and more accurate.Conclusions. Our findings suggest that this approach can be a valuable tool for legal professionals and researchers to navigate the complex landscape of Arabic legal information to improve efficiency and accuracy in legal information retrieval. |
| format | Article |
| id | doaj-art-0f2c126811094519894d314f3cd76045 |
| institution | Kabale University |
| issn | 2782-3210 2500-316X |
| language | Russian |
| publishDate | 2024-10-01 |
| publisher | MIREA - Russian Technological University |
| record_format | Article |
| series | Российский технологический журнал |
| spelling | doaj-art-0f2c126811094519894d314f3cd760452025-08-20T03:57:27ZrusMIREA - Russian Technological UniversityРоссийский технологический журнал2782-32102500-316X2024-10-0112571610.32362/2500-316X-2024-12-5-7-16445Automating the search for legal information in Arabic: A novel approach to document retrievalK. S. Jafar0A. A. Mohammad1A. H. Issa2A. V. Panov3MIREA – Russian Technological UniversityHSE UniversityRussian Biotechnological UniversityMIREA – Russian Technological UniversityObjectives. The retrieval of legal information, including information related to issues such as punishment for crimes and felonies, represents a challenging task. The approach proposed in the article represents an efficient way to automate the retrieval of legal information without requiring a large amount of labeled data or consuming significant computational resources. The work set out to analyze the feasibility of a document retrieval approach in the context of Arabic legal texts using natural language processing and unsupervised clustering techniques.Methods. The Topic-to-Vector (Top2Vec) topic modeling algorithm for generating document embeddings based on semantic context is used to cluster Arabic legal texts into relevant topics. We also used the HDBSCAN densitybased clustering algorithm to identify subtopics within each cluster. Challenges of working with Arabic legal text, such as morphological complexity, ambiguity, and a lack of standardized terminology, are addressed by means of a proposed preprocessing pipeline that includes tokenization, normalization, stemming, and stop-word removal.Results. The results of the evaluation of the approach using a dataset of legal texts in Arabic based on keywords demonstrated its superior effectiveness in terms of accuracy and memorability. The proposed approach provides 87% accuracy and 80% completeness. This circumstance can significantly improve the search for legal documents, making the process faster and more accurate.Conclusions. Our findings suggest that this approach can be a valuable tool for legal professionals and researchers to navigate the complex landscape of Arabic legal information to improve efficiency and accuracy in legal information retrieval.https://www.rtj-mirea.ru/jour/article/view/977search for documentsnlptop2vechdbscanarabic legal documentsword embeddingscosine similarity |
| spellingShingle | K. S. Jafar A. A. Mohammad A. H. Issa A. V. Panov Automating the search for legal information in Arabic: A novel approach to document retrieval Российский технологический журнал search for documents nlp top2vec hdbscan arabic legal documents word embeddings cosine similarity |
| title | Automating the search for legal information in Arabic: A novel approach to document retrieval |
| title_full | Automating the search for legal information in Arabic: A novel approach to document retrieval |
| title_fullStr | Automating the search for legal information in Arabic: A novel approach to document retrieval |
| title_full_unstemmed | Automating the search for legal information in Arabic: A novel approach to document retrieval |
| title_short | Automating the search for legal information in Arabic: A novel approach to document retrieval |
| title_sort | automating the search for legal information in arabic a novel approach to document retrieval |
| topic | search for documents nlp top2vec hdbscan arabic legal documents word embeddings cosine similarity |
| url | https://www.rtj-mirea.ru/jour/article/view/977 |
| work_keys_str_mv | AT ksjafar automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval AT aamohammad automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval AT ahissa automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval AT avpanov automatingthesearchforlegalinformationinarabicanovelapproachtodocumentretrieval |