SauDial: The Saudi Arabic dialects game localization datasetMendeley Data

Content creation and localization for video games demand substantial effort from script writers and localization teams. Consequently, we present SauDial, the Saudi Arabic Dialects Game Localization Parallel Dataset, a collection of Saudi dialectal expressions tailored for localization-related tasks....

Full description

Saved in:
Bibliographic Details
Main Authors: Naif Alanazi, Mohammed Al-Batineh, Hussein Abu-Rayyash
Format: Article
Language:English
Published: Elsevier 2025-10-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925006304
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849240761587990528
author Naif Alanazi
Mohammed Al-Batineh
Hussein Abu-Rayyash
author_facet Naif Alanazi
Mohammed Al-Batineh
Hussein Abu-Rayyash
author_sort Naif Alanazi
collection DOAJ
description Content creation and localization for video games demand substantial effort from script writers and localization teams. Consequently, we present SauDial, the Saudi Arabic Dialects Game Localization Parallel Dataset, a collection of Saudi dialectal expressions tailored for localization-related tasks. The corpus features samples from four Saudi dialects, namely Najdi, Hijazi, Janoubi, and Eastern. The dataset was first produced through an AI‑driven process informed by cultural knowledge, linguistic expertise, and game‑specific context, then manually cleaned, refined, and revised to ensure dialectal accuracy, tonal appropriateness, and cultural and semantic fidelity. Each entry contains an English source line, a Modern Standard Arabic (MSA) translation, and a dialectal counterpart together with context clues, age ratings, and linguistic notes. The dataset spans a broad array of scenarios relevant to multiple game genres and tonal indicators, and it aligns with the General Authority of Media Regulation (GCAM) official rating system. In addition, it opens avenues for research in Translation, Cultural, Localization, and Game Studies, while in educational settings it can support translation and localization courses and serve as a translation memory that aids professional translators and localizers. To the best of our knowledge, SauDial is the first dataset of its kind in game localization and offers a foundation that can strengthen the authenticity and cultural resonance of games localized for the Saudi market.
format Article
id doaj-art-dd5ca8a1a8a944bc8e27b808bfb3e44b
institution Kabale University
issn 2352-3409
language English
publishDate 2025-10-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-dd5ca8a1a8a944bc8e27b808bfb3e44b2025-08-20T04:00:27ZengElsevierData in Brief2352-34092025-10-016211190610.1016/j.dib.2025.111906SauDial: The Saudi Arabic dialects game localization datasetMendeley DataNaif Alanazi0Mohammed Al-Batineh1Hussein Abu-Rayyash2Department of English, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia; Corresponding author.Department of Languages and Literature, United Arab Emirates University, United Arab EmiratesDepartment of Modern & Classical Language Studies, Kent State University, Kent, USAContent creation and localization for video games demand substantial effort from script writers and localization teams. Consequently, we present SauDial, the Saudi Arabic Dialects Game Localization Parallel Dataset, a collection of Saudi dialectal expressions tailored for localization-related tasks. The corpus features samples from four Saudi dialects, namely Najdi, Hijazi, Janoubi, and Eastern. The dataset was first produced through an AI‑driven process informed by cultural knowledge, linguistic expertise, and game‑specific context, then manually cleaned, refined, and revised to ensure dialectal accuracy, tonal appropriateness, and cultural and semantic fidelity. Each entry contains an English source line, a Modern Standard Arabic (MSA) translation, and a dialectal counterpart together with context clues, age ratings, and linguistic notes. The dataset spans a broad array of scenarios relevant to multiple game genres and tonal indicators, and it aligns with the General Authority of Media Regulation (GCAM) official rating system. In addition, it opens avenues for research in Translation, Cultural, Localization, and Game Studies, while in educational settings it can support translation and localization courses and serve as a translation memory that aids professional translators and localizers. To the best of our knowledge, SauDial is the first dataset of its kind in game localization and offers a foundation that can strengthen the authenticity and cultural resonance of games localized for the Saudi market.http://www.sciencedirect.com/science/article/pii/S2352340925006304Game localizationCultural adaptationArabic gamingDialect corpusLarge language models
spellingShingle Naif Alanazi
Mohammed Al-Batineh
Hussein Abu-Rayyash
SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
Data in Brief
Game localization
Cultural adaptation
Arabic gaming
Dialect corpus
Large language models
title SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
title_full SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
title_fullStr SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
title_full_unstemmed SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
title_short SauDial: The Saudi Arabic dialects game localization datasetMendeley Data
title_sort saudial the saudi arabic dialects game localization datasetmendeley data
topic Game localization
Cultural adaptation
Arabic gaming
Dialect corpus
Large language models
url http://www.sciencedirect.com/science/article/pii/S2352340925006304
work_keys_str_mv AT naifalanazi saudialthesaudiarabicdialectsgamelocalizationdatasetmendeleydata
AT mohammedalbatineh saudialthesaudiarabicdialectsgamelocalizationdatasetmendeleydata
AT husseinaburayyash saudialthesaudiarabicdialectsgamelocalizationdatasetmendeleydata