Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data

The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and lingu...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Bamoki, Shakhawan Hares Wady, Soran Badawi
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925002653
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849470649436733440
author Muhammad Bamoki
Shakhawan Hares Wady
Soran Badawi
author_facet Muhammad Bamoki
Shakhawan Hares Wady
Soran Badawi
author_sort Muhammad Bamoki
collection DOAJ
description The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and linguistic analysis. The translation process employed a thorough methodology that combined advanced linguistic tools with the expertise of bilingual religious scholars, translators, and professional proofreaders over several years. Careful attention was given to maintaining both semantic accuracy and theological precision, ensuring a faithful representation of the original Arabic text. The dataset comprises two primary files: a raw translation and a refined linguistic version. We performed various statistical analyses, including the identification of the top 20 most frequent words, a comparative analysis of verse lengths between the Kurdish and Arabic versions, and an evaluation of unique word distributions in both the raw and processed texts. This Kurdish Quran translation dataset represents a significant resource for computational linguistics, particularly in the development of neural machine translation models and in linguistic research focused on under-resourced languages.
format Article
id doaj-art-73a0d1303a8446f0bf50fefdb5e861d1
institution Kabale University
issn 2352-3409
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-73a0d1303a8446f0bf50fefdb5e861d12025-08-20T03:25:05ZengElsevierData in Brief2352-34092025-06-016011153310.1016/j.dib.2025.111533Holy Quran Kurdish Sorani translation dataset for language modellingMendeley DataMuhammad Bamoki0Shakhawan Hares Wady1Soran Badawi2General Directorate of Awqaf / Sulaimaniyah, KRG, Kurdistan, IraqDepartment of Business Administration, Charmo University, KRG, Chamchamal, Kurdistan, IraqLanguage Center, Charmo University, KRG, Chamchamal, Kurdistan, Iraq; Corresponding author.The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and linguistic analysis. The translation process employed a thorough methodology that combined advanced linguistic tools with the expertise of bilingual religious scholars, translators, and professional proofreaders over several years. Careful attention was given to maintaining both semantic accuracy and theological precision, ensuring a faithful representation of the original Arabic text. The dataset comprises two primary files: a raw translation and a refined linguistic version. We performed various statistical analyses, including the identification of the top 20 most frequent words, a comparative analysis of verse lengths between the Kurdish and Arabic versions, and an evaluation of unique word distributions in both the raw and processed texts. This Kurdish Quran translation dataset represents a significant resource for computational linguistics, particularly in the development of neural machine translation models and in linguistic research focused on under-resourced languages.http://www.sciencedirect.com/science/article/pii/S2352340925002653Quranic textMachine learningLow-resource languagesSorani Quran translationPreprocessing
spellingShingle Muhammad Bamoki
Shakhawan Hares Wady
Soran Badawi
Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
Data in Brief
Quranic text
Machine learning
Low-resource languages
Sorani Quran translation
Preprocessing
title Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
title_full Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
title_fullStr Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
title_full_unstemmed Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
title_short Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
title_sort holy quran kurdish sorani translation dataset for language modellingmendeley data
topic Quranic text
Machine learning
Low-resource languages
Sorani Quran translation
Preprocessing
url http://www.sciencedirect.com/science/article/pii/S2352340925002653
work_keys_str_mv AT muhammadbamoki holyqurankurdishsoranitranslationdatasetforlanguagemodellingmendeleydata
AT shakhawanhareswady holyqurankurdishsoranitranslationdatasetforlanguagemodellingmendeleydata
AT soranbadawi holyqurankurdishsoranitranslationdatasetforlanguagemodellingmendeleydata