Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children

<b>Background/Objectives:</b> This study investigates the application of machine learning (ML) techniques in diagnosing speech sound disorders (SSDs) in Saudi Arabic-speaking children, with a specific focus on phonological biomarkers, particularly Infrequent Variance (InfrVar), to improv...

Full description

Saved in:

Bibliographic Details
Main Authors:	Deema F. Turki, Ahmad F. Turki
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Diagnostics
Subjects:	speech sound disorders (SSDs) machine learning (ML) infrequent variance (InfrVar) phonological development Saudi Arabic-speaking children
Online Access:	https://www.mdpi.com/2075-4418/15/11/1401
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849722569067855872
author	Deema F. Turki Ahmad F. Turki
author_facet	Deema F. Turki Ahmad F. Turki
author_sort	Deema F. Turki
collection	DOAJ
description	<b>Background/Objectives:</b> This study investigates the application of machine learning (ML) techniques in diagnosing speech sound disorders (SSDs) in Saudi Arabic-speaking children, with a specific focus on phonological biomarkers, particularly Infrequent Variance (InfrVar), to improve diagnostic accuracy. SSDs are a significant concern in pediatric speech pathology, affecting an estimated 10–15% of preschool-aged children worldwide. However, accurate diagnosis remains challenging, especially in linguistically diverse populations. Traditional diagnostic tools, such as the Percentage of Consonants Correct (PCC), often fail to capture subtle phonological variations. This study explores the potential of machine learning models to enhance diagnostic accuracy by incorporating culturally relevant phonological biomarkers like InfrVar, aiming to develop a more effective diagnostic approach for SSDs in Saudi Arabic-speaking children. <b>Methods:</b> Data from 235 Saudi Arabic-speaking children aged 2;6 to 5;11 years were analyzed using several machine learning models: Random Forest, Support Vector Machine (SVM), XGBoost, Logistic Regression, K-Nearest Neighbors, and Naïve Bayes. The dataset was used to classify speech patterns into four categories: Atypical, Typical Development (TD), Articulation, and Delay. Phonological features such as Phonological Variance (PhonVar), InfrVar, and Percentage of Consonants Correct (PCC) were used as key variables. SHapley Additive exPlanations (SHAP) analysis was employed to interpret the contributions of individual features to model predictions. <b>Results:</b> The XGBoost and Random Forest models demonstrated the highest performance, with an accuracy of 91.49% and an AUC of 99.14%. SHAP analysis revealed that articulation patterns and phonological patterns were the most influential features for distinguishing between Atypical and TD categories. The K-Means clustering approach identified four distinct subgroups based on speech development patterns: TD (46.61%), Articulation (25.42%), Atypical (18.64%), and Delay (9.32%). <b>Conclusions:</b> Machine learning models, particularly XGBoost and Random Forest, effectively classified speech development categories in Saudi Arabic-speaking children. This study highlights the importance of incorporating culturally specific phonological biomarkers like InfrVar and PhonVar to improve diagnostic precision for SSDs. These findings lay the groundwork for the development of AI-assisted diagnostic tools tailored to diverse linguistic contexts, enhancing early intervention strategies in pediatric speech pathology.
format	Article
id	doaj-art-e9d94d2cfb6f4f418a871a9dda5dc92d
institution	DOAJ
issn	2075-4418
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Diagnostics
spelling	doaj-art-e9d94d2cfb6f4f418a871a9dda5dc92d2025-08-20T03:11:18ZengMDPI AGDiagnostics2075-44182025-05-011511140110.3390/diagnostics15111401Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking ChildrenDeema F. Turki0Ahmad F. Turki1Speech and Hearing Pathology Department, Faculty of Medical Rehabilitation Sciences, King Abdulaziz University, Jeddah 21589, Saudi ArabiaElectrical and Computer Engineering Department, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia<b>Background/Objectives:</b> This study investigates the application of machine learning (ML) techniques in diagnosing speech sound disorders (SSDs) in Saudi Arabic-speaking children, with a specific focus on phonological biomarkers, particularly Infrequent Variance (InfrVar), to improve diagnostic accuracy. SSDs are a significant concern in pediatric speech pathology, affecting an estimated 10–15% of preschool-aged children worldwide. However, accurate diagnosis remains challenging, especially in linguistically diverse populations. Traditional diagnostic tools, such as the Percentage of Consonants Correct (PCC), often fail to capture subtle phonological variations. This study explores the potential of machine learning models to enhance diagnostic accuracy by incorporating culturally relevant phonological biomarkers like InfrVar, aiming to develop a more effective diagnostic approach for SSDs in Saudi Arabic-speaking children. <b>Methods:</b> Data from 235 Saudi Arabic-speaking children aged 2;6 to 5;11 years were analyzed using several machine learning models: Random Forest, Support Vector Machine (SVM), XGBoost, Logistic Regression, K-Nearest Neighbors, and Naïve Bayes. The dataset was used to classify speech patterns into four categories: Atypical, Typical Development (TD), Articulation, and Delay. Phonological features such as Phonological Variance (PhonVar), InfrVar, and Percentage of Consonants Correct (PCC) were used as key variables. SHapley Additive exPlanations (SHAP) analysis was employed to interpret the contributions of individual features to model predictions. <b>Results:</b> The XGBoost and Random Forest models demonstrated the highest performance, with an accuracy of 91.49% and an AUC of 99.14%. SHAP analysis revealed that articulation patterns and phonological patterns were the most influential features for distinguishing between Atypical and TD categories. The K-Means clustering approach identified four distinct subgroups based on speech development patterns: TD (46.61%), Articulation (25.42%), Atypical (18.64%), and Delay (9.32%). <b>Conclusions:</b> Machine learning models, particularly XGBoost and Random Forest, effectively classified speech development categories in Saudi Arabic-speaking children. This study highlights the importance of incorporating culturally specific phonological biomarkers like InfrVar and PhonVar to improve diagnostic precision for SSDs. These findings lay the groundwork for the development of AI-assisted diagnostic tools tailored to diverse linguistic contexts, enhancing early intervention strategies in pediatric speech pathology.https://www.mdpi.com/2075-4418/15/11/1401speech sound disorders (SSDs)machine learning (ML)infrequent variance (InfrVar)phonological developmentSaudi Arabic-speaking children
spellingShingle	Deema F. Turki Ahmad F. Turki Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children Diagnostics speech sound disorders (SSDs) machine learning (ML) infrequent variance (InfrVar) phonological development Saudi Arabic-speaking children
title	Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_full	Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_fullStr	Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_full_unstemmed	Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_short	Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children
title_sort	machine learning based identification of phonological biomarkers for speech sound disorders in saudi arabic speaking children
topic	speech sound disorders (SSDs) machine learning (ML) infrequent variance (InfrVar) phonological development Saudi Arabic-speaking children
url	https://www.mdpi.com/2075-4418/15/11/1401
work_keys_str_mv	AT deemafturki machinelearningbasedidentificationofphonologicalbiomarkersforspeechsounddisordersinsaudiarabicspeakingchildren AT ahmadfturki machinelearningbasedidentificationofphonologicalbiomarkersforspeechsounddisordersinsaudiarabicspeakingchildren

Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children

Similar Items