Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children

<b>Background/Objectives:</b> This study investigates the application of machine learning (ML) techniques in diagnosing speech sound disorders (SSDs) in Saudi Arabic-speaking children, with a specific focus on phonological biomarkers, particularly Infrequent Variance (InfrVar), to improv...

Full description

Saved in:
Bibliographic Details
Main Authors: Deema F. Turki, Ahmad F. Turki
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/11/1401
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722569067855872
author Deema F. Turki
Ahmad F. Turki
author_facet Deema F. Turki
Ahmad F. Turki
author_sort Deema F. Turki
collection DOAJ
description <b>Background/Objectives:</b> This study investigates the application of machine learning (ML) techniques in diagnosing speech sound disorders (SSDs) in Saudi Arabic-speaking children, with a specific focus on phonological biomarkers, particularly Infrequent Variance (InfrVar), to improve diagnostic accuracy. SSDs are a significant concern in pediatric speech pathology, affecting an estimated 10–15% of preschool-aged children worldwide. However, accurate diagnosis remains challenging, especially in linguistically diverse populations. Traditional diagnostic tools, such as the Percentage of Consonants Correct (PCC), often fail to capture subtle phonological variations. This study explores the potential of machine learning models to enhance diagnostic accuracy by incorporating culturally relevant phonological biomarkers like InfrVar, aiming to develop a more effective diagnostic approach for SSDs in Saudi Arabic-speaking children. <b>Methods:</b> Data from 235 Saudi Arabic-speaking children aged 2;6 to 5;11 years were analyzed using several machine learning models: Random Forest, Support Vector Machine (SVM), XGBoost, Logistic Regression, K-Nearest Neighbors, and Naïve Bayes. The dataset was used to classify speech patterns into four categories: Atypical, Typical Development (TD), Articulation, and Delay. Phonological features such as Phonological Variance (PhonVar), InfrVar, and Percentage of Consonants Correct (PCC) were used as key variables. SHapley Additive exPlanations (SHAP) analysis was employed to interpret the contributions of individual features to model predictions. <b>Results:</b> The XGBoost and Random Forest models demonstrated the highest performance, with an accuracy of 91.49% and an AUC of 99.14%. SHAP analysis revealed that articulation patterns and phonological patterns were the most influential features for distinguishing between Atypical and TD categories. The K-Means clustering approach identified four distinct subgroups based on speech development patterns: TD (46.61%), Articulation (25.42%), Atypical (18.64%), and Delay (9.32%). <b>Conclusions:</b> Machine learning models, particularly XGBoost and Random Forest, effectively classified speech development categories in Saudi Arabic-speaking children. This study highlights the importance of incorporating culturally specific phonological biomarkers like InfrVar and PhonVar to improve diagnostic precision for SSDs. These findings lay the groundwork for the development of AI-assisted diagnostic tools tailored to diverse linguistic contexts, enhancing early intervention strategies in pediatric speech pathology.
format Article
id doaj-art-e9d94d2cfb6f4f418a871a9dda5dc92d
institution DOAJ
issn 2075-4418
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-e9d94d2cfb6f4f418a871a9dda5dc92d2025-08-20T03:11:18ZengMDPI AGDiagnostics2075-44182025-05-011511140110.3390/diagnostics15111401Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking ChildrenDeema F. Turki0Ahmad F. Turki1Speech and Hearing Pathology Department, Faculty of Medical Rehabilitation Sciences, King Abdulaziz University, Jeddah 21589, Saudi ArabiaElectrical and Computer Engineering Department, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia<b>Background/Objectives:</b> This study investigates the application of machine learning (ML) techniques in diagnosing speech sound disorders (SSDs) in Saudi Arabic-speaking children, with a specific focus on phonological biomarkers, particularly Infrequent Variance (InfrVar), to improve diagnostic accuracy. SSDs are a significant concern in pediatric speech pathology, affecting an estimated 10–15% of preschool-aged children worldwide. However, accurate diagnosis remains challenging, especially in linguistically diverse populations. Traditional diagnostic tools, such as the Percentage of Consonants Correct (PCC), often fail to capture subtle phonological variations. This study explores the potential of machine learning models to enhance diagnostic accuracy by incorporating culturally relevant phonological biomarkers like InfrVar, aiming to develop a more effective diagnostic approach for SSDs in Saudi Arabic-speaking children. <b>Methods:</b> Data from 235 Saudi Arabic-speaking children aged 2;6 to 5;11 years were analyzed using several machine learning models: Random Forest, Support Vector Machine (SVM), XGBoost, Logistic Regression, K-Nearest Neighbors, and Naïve Bayes. The dataset was used to classify speech patterns into four categories: Atypical, Typical Development (TD), Articulation, and Delay. Phonological features such as Phonological Variance (PhonVar), InfrVar, and Percentage of Consonants Correct (PCC) were used as key variables. SHapley Additive exPlanations (SHAP) analysis was employed to interpret the contributions of individual features to model predictions. <b>Results:</b> The XGBoost and Random Forest models demonstrated the highest performance, with an accuracy of 91.49% and an AUC of 99.14%. SHAP analysis revealed that articulation patterns and phonological patterns were the most influential features for distinguishing between Atypical and TD categories. The K-Means clustering approach identified four distinct subgroups based on speech development patterns: TD (46.61%), Articulation (25.42%), Atypical (18.64%), and Delay (9.32%). <b>Conclusions:</b> Machine learning models, particularly XGBoost and Random Forest, effectively classified speech development categories in Saudi Arabic-speaking children. This study highlights the importance of incorporating culturally specific phonological biomarkers like InfrVar and PhonVar to improve diagnostic precision for SSDs. These findings lay the groundwork for the development of AI-assisted diagnostic tools tailored to diverse linguistic contexts, enhancing early intervention strategies in pediatric speech pathology.https://www.mdpi.com/2075-4418/15/11/1401speech sound disorders (SSDs)machine learning (ML)infrequent variance (InfrVar)phonological developmentSaudi Arabic-speaking children
spellingShingle Deema F. Turki
Ahmad F. Turki
Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
Diagnostics
speech sound disorders (SSDs)
machine learning (ML)
infrequent variance (InfrVar)
phonological development
Saudi Arabic-speaking children
title Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_full Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_fullStr Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_full_unstemmed Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_short Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_sort machine learning based identification of phonological biomarkers for speech sound disorders in saudi arabic speaking children
topic speech sound disorders (SSDs)
machine learning (ML)
infrequent variance (InfrVar)
phonological development
Saudi Arabic-speaking children
url https://www.mdpi.com/2075-4418/15/11/1401
work_keys_str_mv AT deemafturki machinelearningbasedidentificationofphonologicalbiomarkersforspeechsounddisordersinsaudiarabicspeakingchildren
AT ahmadfturki machinelearningbasedidentificationofphonologicalbiomarkersforspeechsounddisordersinsaudiarabicspeakingchildren