Hajj-FQA: A benchmark Arabic dataset for developing question-answering systems on Hajj fatwas
Abstract Deep learning has significantly advanced the question-answering (QA) systems across various sectors. However, Arabic-language systems for Hajj-related fatwas (non-binding Islamic legal opinions issued by muftis) remain underdeveloped. This paper introduces Hajj-FQA, a benchmark Arabic datas...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44443-025-00128-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Deep learning has significantly advanced the question-answering (QA) systems across various sectors. However, Arabic-language systems for Hajj-related fatwas (non-binding Islamic legal opinions issued by muftis) remain underdeveloped. This paper introduces Hajj-FQA, a benchmark Arabic dataset specifically designed to develop HajjBot - a specialized chatbot for fatwas QA during the annual Hajj pilgrimage. The dataset captures the unique linguistic and jurisprudential characteristics of pilgrims’ inquiries, enabling accurate, domain-specific responses. We present a comprehensive quantitative analysis of the dataset’s construction methodology and its distinctive question-answer patterns. Evaluation using multilingual and Arabic-specific language models across three tasks - machine reading comprehension (MRC), duplicate question detection (DQD), and duplicate answer detection (DAD) - with 10-fold cross-validation demonstrates the practical utility of Hajj-FQA. Results show exceptional performance in classification tasks (AraBERTv0.2 achieved a precision score of 99.19% for DQD and 99.26% for DAD) and strong extractive answering capability with an $$F\text {-score}$$ F -score of 72.78%. While generative performance reached $$\text {BERT-}F$$ BERT- F score of 71.4% (AraBART), MRC variability highlights challenges in religious reasoning. These findings establish Hajj-FQA as both: (1) a critical resource for developing specialized fatwa chatbots like HajjBot, and (2) a benchmark for Arabic religious QA systems. The dataset directly addresses the urgent need for accurate, automated fatwa assistance during Hajj, while providing insights for future improvements in Islamic NLP applications. |
|---|---|
| ISSN: | 1319-1578 2213-1248 |