Facial Movements Extracted from Video for the Kinematic Classification of Speech

Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis...

Full description

Saved in:
Bibliographic Details
Main Authors: Richard Palmer, Roslyn Ward, Petra Helmholz, Geoffrey R. Strauss, Paul Davey, Neville Hennessey, Linda Orton, Aravind Namasivayam
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/22/7235
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846152468561395712
author Richard Palmer
Roslyn Ward
Petra Helmholz
Geoffrey R. Strauss
Paul Davey
Neville Hennessey
Linda Orton
Aravind Namasivayam
author_facet Richard Palmer
Roslyn Ward
Petra Helmholz
Geoffrey R. Strauss
Paul Davey
Neville Hennessey
Linda Orton
Aravind Namasivayam
author_sort Richard Palmer
collection DOAJ
description Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system to facilitate rapid and objective assessment of motor speech control issues underlying SSD. This study evaluates the feasibility of using automatically extracted three-dimensional (3D) facial measurements from single two-dimensional (2D) front-facing video cameras for classifying speech movements. Videos were recorded of 51 adults and 77 children between 3 and 4 years of age (all typically developed for age) saying 20 words from the mandibular and labial-facial levels of the Motor-Speech Hierarchy Probe Wordlist (MSH-PW). Measurements around the jaw and lips were automatically extracted from the 2D video frames using a state-of-the-art facial mesh detection and tracking algorithm, and each individual measurement was tested in a Leave-One-Out Cross-Validation (LOOCV) framework for its word classification performance. Statistics were evaluated at the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>α</mi><mo>=</mo><mn>0.05</mn></mrow></semantics></math></inline-formula> significance level and several measurements were found to exhibit significant classification performance in both the adult and child cohorts. Importantly, measurements of depth indirectly inferred from the 2D video frames were among those found to be significant. The significant measurements were shown to match expectations of facial movements across the 20 words, demonstrating their potential applicability in supporting clinical evaluations of speech production.
format Article
id doaj-art-0a6dd034fba74f53a0b5a4d0e3ce9d3f
institution Kabale University
issn 1424-8220
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-0a6dd034fba74f53a0b5a4d0e3ce9d3f2024-11-26T18:21:12ZengMDPI AGSensors1424-82202024-11-012422723510.3390/s24227235Facial Movements Extracted from Video for the Kinematic Classification of SpeechRichard Palmer0Roslyn Ward1Petra Helmholz2Geoffrey R. Strauss3Paul Davey4Neville Hennessey5Linda Orton6Aravind Namasivayam7School of Earth and Planetary Sciences, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Earth and Planetary Sciences, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaDepartment of Speech-Language Pathology, University of Toronto, Toronto, ON M5G 1V7, CanadaSpeech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system to facilitate rapid and objective assessment of motor speech control issues underlying SSD. This study evaluates the feasibility of using automatically extracted three-dimensional (3D) facial measurements from single two-dimensional (2D) front-facing video cameras for classifying speech movements. Videos were recorded of 51 adults and 77 children between 3 and 4 years of age (all typically developed for age) saying 20 words from the mandibular and labial-facial levels of the Motor-Speech Hierarchy Probe Wordlist (MSH-PW). Measurements around the jaw and lips were automatically extracted from the 2D video frames using a state-of-the-art facial mesh detection and tracking algorithm, and each individual measurement was tested in a Leave-One-Out Cross-Validation (LOOCV) framework for its word classification performance. Statistics were evaluated at the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>α</mi><mo>=</mo><mn>0.05</mn></mrow></semantics></math></inline-formula> significance level and several measurements were found to exhibit significant classification performance in both the adult and child cohorts. Importantly, measurements of depth indirectly inferred from the 2D video frames were among those found to be significant. The significant measurements were shown to match expectations of facial movements across the 20 words, demonstrating their potential applicability in supporting clinical evaluations of speech production.https://www.mdpi.com/1424-8220/24/22/7235digital biomarkerskinematicsspatiotemporal profilingSpeech Sound Disorders
spellingShingle Richard Palmer
Roslyn Ward
Petra Helmholz
Geoffrey R. Strauss
Paul Davey
Neville Hennessey
Linda Orton
Aravind Namasivayam
Facial Movements Extracted from Video for the Kinematic Classification of Speech
Sensors
digital biomarkers
kinematics
spatiotemporal profiling
Speech Sound Disorders
title Facial Movements Extracted from Video for the Kinematic Classification of Speech
title_full Facial Movements Extracted from Video for the Kinematic Classification of Speech
title_fullStr Facial Movements Extracted from Video for the Kinematic Classification of Speech
title_full_unstemmed Facial Movements Extracted from Video for the Kinematic Classification of Speech
title_short Facial Movements Extracted from Video for the Kinematic Classification of Speech
title_sort facial movements extracted from video for the kinematic classification of speech
topic digital biomarkers
kinematics
spatiotemporal profiling
Speech Sound Disorders
url https://www.mdpi.com/1424-8220/24/22/7235
work_keys_str_mv AT richardpalmer facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT roslynward facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT petrahelmholz facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT geoffreyrstrauss facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT pauldavey facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT nevillehennessey facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT lindaorton facialmovementsextractedfromvideoforthekinematicclassificationofspeech
AT aravindnamasivayam facialmovementsextractedfromvideoforthekinematicclassificationofspeech