SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
Content creation and localization for video games demand substantial effort from script writers and localization teams. Consequently, we present SauDial, the Saudi Arabic Dialects Game Localization Parallel Dataset, a collection of Saudi dialectal expressions tailored for localization-related tasks....
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-10-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925006304 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849240761587990528 |
|---|---|
| author | Naif Alanazi Mohammed Al-Batineh Hussein Abu-Rayyash |
| author_facet | Naif Alanazi Mohammed Al-Batineh Hussein Abu-Rayyash |
| author_sort | Naif Alanazi |
| collection | DOAJ |
| description | Content creation and localization for video games demand substantial effort from script writers and localization teams. Consequently, we present SauDial, the Saudi Arabic Dialects Game Localization Parallel Dataset, a collection of Saudi dialectal expressions tailored for localization-related tasks. The corpus features samples from four Saudi dialects, namely Najdi, Hijazi, Janoubi, and Eastern. The dataset was first produced through an AI‑driven process informed by cultural knowledge, linguistic expertise, and game‑specific context, then manually cleaned, refined, and revised to ensure dialectal accuracy, tonal appropriateness, and cultural and semantic fidelity. Each entry contains an English source line, a Modern Standard Arabic (MSA) translation, and a dialectal counterpart together with context clues, age ratings, and linguistic notes. The dataset spans a broad array of scenarios relevant to multiple game genres and tonal indicators, and it aligns with the General Authority of Media Regulation (GCAM) official rating system. In addition, it opens avenues for research in Translation, Cultural, Localization, and Game Studies, while in educational settings it can support translation and localization courses and serve as a translation memory that aids professional translators and localizers. To the best of our knowledge, SauDial is the first dataset of its kind in game localization and offers a foundation that can strengthen the authenticity and cultural resonance of games localized for the Saudi market. |
| format | Article |
| id | doaj-art-dd5ca8a1a8a944bc8e27b808bfb3e44b |
| institution | Kabale University |
| issn | 2352-3409 |
| language | English |
| publishDate | 2025-10-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Data in Brief |
| spelling | doaj-art-dd5ca8a1a8a944bc8e27b808bfb3e44b2025-08-20T04:00:27ZengElsevierData in Brief2352-34092025-10-016211190610.1016/j.dib.2025.111906SauDial: The Saudi Arabic dialects game localization datasetMendeley DataNaif Alanazi0Mohammed Al-Batineh1Hussein Abu-Rayyash2Department of English, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia; Corresponding author.Department of Languages and Literature, United Arab Emirates University, United Arab EmiratesDepartment of Modern & Classical Language Studies, Kent State University, Kent, USAContent creation and localization for video games demand substantial effort from script writers and localization teams. Consequently, we present SauDial, the Saudi Arabic Dialects Game Localization Parallel Dataset, a collection of Saudi dialectal expressions tailored for localization-related tasks. The corpus features samples from four Saudi dialects, namely Najdi, Hijazi, Janoubi, and Eastern. The dataset was first produced through an AI‑driven process informed by cultural knowledge, linguistic expertise, and game‑specific context, then manually cleaned, refined, and revised to ensure dialectal accuracy, tonal appropriateness, and cultural and semantic fidelity. Each entry contains an English source line, a Modern Standard Arabic (MSA) translation, and a dialectal counterpart together with context clues, age ratings, and linguistic notes. The dataset spans a broad array of scenarios relevant to multiple game genres and tonal indicators, and it aligns with the General Authority of Media Regulation (GCAM) official rating system. In addition, it opens avenues for research in Translation, Cultural, Localization, and Game Studies, while in educational settings it can support translation and localization courses and serve as a translation memory that aids professional translators and localizers. To the best of our knowledge, SauDial is the first dataset of its kind in game localization and offers a foundation that can strengthen the authenticity and cultural resonance of games localized for the Saudi market.http://www.sciencedirect.com/science/article/pii/S2352340925006304Game localizationCultural adaptationArabic gamingDialect corpusLarge language models |
| spellingShingle | Naif Alanazi Mohammed Al-Batineh Hussein Abu-Rayyash SauDial: The Saudi Arabic dialects game localization datasetMendeley Data Data in Brief Game localization Cultural adaptation Arabic gaming Dialect corpus Large language models |
| title | SauDial: The Saudi Arabic dialects game localization datasetMendeley Data |
| title_full | SauDial: The Saudi Arabic dialects game localization datasetMendeley Data |
| title_fullStr | SauDial: The Saudi Arabic dialects game localization datasetMendeley Data |
| title_full_unstemmed | SauDial: The Saudi Arabic dialects game localization datasetMendeley Data |
| title_short | SauDial: The Saudi Arabic dialects game localization datasetMendeley Data |
| title_sort | saudial the saudi arabic dialects game localization datasetmendeley data |
| topic | Game localization Cultural adaptation Arabic gaming Dialect corpus Large language models |
| url | http://www.sciencedirect.com/science/article/pii/S2352340925006304 |
| work_keys_str_mv | AT naifalanazi saudialthesaudiarabicdialectsgamelocalizationdatasetmendeleydata AT mohammedalbatineh saudialthesaudiarabicdialectsgamelocalizationdatasetmendeleydata AT husseinaburayyash saudialthesaudiarabicdialectsgamelocalizationdatasetmendeleydata |