Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data
The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and lingu...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925002653 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849470649436733440 |
|---|---|
| author | Muhammad Bamoki Shakhawan Hares Wady Soran Badawi |
| author_facet | Muhammad Bamoki Shakhawan Hares Wady Soran Badawi |
| author_sort | Muhammad Bamoki |
| collection | DOAJ |
| description | The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and linguistic analysis. The translation process employed a thorough methodology that combined advanced linguistic tools with the expertise of bilingual religious scholars, translators, and professional proofreaders over several years. Careful attention was given to maintaining both semantic accuracy and theological precision, ensuring a faithful representation of the original Arabic text. The dataset comprises two primary files: a raw translation and a refined linguistic version. We performed various statistical analyses, including the identification of the top 20 most frequent words, a comparative analysis of verse lengths between the Kurdish and Arabic versions, and an evaluation of unique word distributions in both the raw and processed texts. This Kurdish Quran translation dataset represents a significant resource for computational linguistics, particularly in the development of neural machine translation models and in linguistic research focused on under-resourced languages. |
| format | Article |
| id | doaj-art-73a0d1303a8446f0bf50fefdb5e861d1 |
| institution | Kabale University |
| issn | 2352-3409 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Data in Brief |
| spelling | doaj-art-73a0d1303a8446f0bf50fefdb5e861d12025-08-20T03:25:05ZengElsevierData in Brief2352-34092025-06-016011153310.1016/j.dib.2025.111533Holy Quran Kurdish Sorani translation dataset for language modellingMendeley DataMuhammad Bamoki0Shakhawan Hares Wady1Soran Badawi2General Directorate of Awqaf / Sulaimaniyah, KRG, Kurdistan, IraqDepartment of Business Administration, Charmo University, KRG, Chamchamal, Kurdistan, IraqLanguage Center, Charmo University, KRG, Chamchamal, Kurdistan, Iraq; Corresponding author.The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and linguistic analysis. The translation process employed a thorough methodology that combined advanced linguistic tools with the expertise of bilingual religious scholars, translators, and professional proofreaders over several years. Careful attention was given to maintaining both semantic accuracy and theological precision, ensuring a faithful representation of the original Arabic text. The dataset comprises two primary files: a raw translation and a refined linguistic version. We performed various statistical analyses, including the identification of the top 20 most frequent words, a comparative analysis of verse lengths between the Kurdish and Arabic versions, and an evaluation of unique word distributions in both the raw and processed texts. This Kurdish Quran translation dataset represents a significant resource for computational linguistics, particularly in the development of neural machine translation models and in linguistic research focused on under-resourced languages.http://www.sciencedirect.com/science/article/pii/S2352340925002653Quranic textMachine learningLow-resource languagesSorani Quran translationPreprocessing |
| spellingShingle | Muhammad Bamoki Shakhawan Hares Wady Soran Badawi Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data Data in Brief Quranic text Machine learning Low-resource languages Sorani Quran translation Preprocessing |
| title | Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data |
| title_full | Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data |
| title_fullStr | Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data |
| title_full_unstemmed | Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data |
| title_short | Holy Quran Kurdish Sorani translation dataset for language modellingMendeley Data |
| title_sort | holy quran kurdish sorani translation dataset for language modellingmendeley data |
| topic | Quranic text Machine learning Low-resource languages Sorani Quran translation Preprocessing |
| url | http://www.sciencedirect.com/science/article/pii/S2352340925002653 |
| work_keys_str_mv | AT muhammadbamoki holyqurankurdishsoranitranslationdatasetforlanguagemodellingmendeleydata AT shakhawanhareswady holyqurankurdishsoranitranslationdatasetforlanguagemodellingmendeleydata AT soranbadawi holyqurankurdishsoranitranslationdatasetforlanguagemodellingmendeleydata |