Hajj-FQA: A benchmark Arabic dataset for developing question-answering systems on Hajj fatwas

Abstract Deep learning has significantly advanced the question-answering (QA) systems across various sectors. However, Arabic-language systems for Hajj-related fatwas (non-binding Islamic legal opinions issued by muftis) remain underdeveloped. This paper introduces Hajj-FQA, a benchmark Arabic datas...

Full description

Saved in:
Bibliographic Details
Main Authors: Hayfa A. Aleid, Aqil M. Azmi
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00128-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Deep learning has significantly advanced the question-answering (QA) systems across various sectors. However, Arabic-language systems for Hajj-related fatwas (non-binding Islamic legal opinions issued by muftis) remain underdeveloped. This paper introduces Hajj-FQA, a benchmark Arabic dataset specifically designed to develop HajjBot - a specialized chatbot for fatwas QA during the annual Hajj pilgrimage. The dataset captures the unique linguistic and jurisprudential characteristics of pilgrims’ inquiries, enabling accurate, domain-specific responses. We present a comprehensive quantitative analysis of the dataset’s construction methodology and its distinctive question-answer patterns. Evaluation using multilingual and Arabic-specific language models across three tasks - machine reading comprehension (MRC), duplicate question detection (DQD), and duplicate answer detection (DAD) - with 10-fold cross-validation demonstrates the practical utility of Hajj-FQA. Results show exceptional performance in classification tasks (AraBERTv0.2 achieved a precision score of 99.19% for DQD and 99.26% for DAD) and strong extractive answering capability with an $$F\text {-score}$$ F -score of 72.78%. While generative performance reached $$\text {BERT-}F$$ BERT- F score of 71.4% (AraBART), MRC variability highlights challenges in religious reasoning. These findings establish Hajj-FQA as both: (1) a critical resource for developing specialized fatwa chatbots like HajjBot, and (2) a benchmark for Arabic religious QA systems. The dataset directly addresses the urgent need for accurate, automated fatwa assistance during Hajj, while providing insights for future improvements in Islamic NLP applications.
ISSN:1319-1578
2213-1248